understanding and improving tcp for web performance
TRANSCRIPT
![Page 1: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/1.jpg)
Understanding and Improving TCP for Web Performance
Tobias Flach
Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao,
Ethan Katz-Bassett, Ramesh Govindan
![Page 2: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/2.jpg)
2
![Page 3: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/3.jpg)
3
![Page 4: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/4.jpg)
Sometimes users wait for a long time to get a response4
![Page 5: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/5.jpg)
Amazon found that every additional 100ms delay in loading a page costs them 1% of sales
5
![Page 6: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/6.jpg)
Motivation
● Goal: Optimize communication means to improve web performance (correlated with more revenue)
● Challenges: network and protocol complexity
● Focus here: Troubleshoot and improve TCP
6
![Page 7: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/7.jpg)
OverviewWhat aspects of TCP are limiting Web access performance, and how can we overcome these limitations?
7
“Reducing Web Latency: The Virtue of Gentle Aggression” (SIGCOMM 13)
Identify performance bottlenecks and root causes
Mitigate root causes through protocol or structural changes
“Understanding TCP Flow Performance at Scale Through
Behavioral Signatures”(in progress)
![Page 8: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/8.jpg)
Overview
A. Reducing Web Latency:The Virtue of Gentle Aggression
B. Understanding TCP Flow Performance at Scale Through Behavioral Signatures
8
![Page 9: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/9.jpg)
We improved Google’sresponse time by 23%
● Across billions of client requests, we improved the mean response time by 23%.
● We achieved this despite ONLY speeding up the 6% of transfers that experienced packet loss.
● Improvement is in the tail: We halved latency in the 99th percentile.
● For latency-sensitive services faster transfers mean a better user experience.
9
![Page 10: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/10.jpg)
Ways to Reduce Latency
High loss, high delay
10
![Page 11: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/11.jpg)
Ways to Reduce Latency
Improve the proximity of services to the user
Leverage multi-stage connections
Low loss, multiplexed
High loss, shorter delay
11
![Page 12: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/12.jpg)
Basic TCP Mechanisms
1
2 - 3
4 - 7
Slow Start
4 - 7
7
Retransmission Timeout (RTO)
4 - 7
4
Fast Retransmit
12
![Page 13: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/13.jpg)
Basic TCP Mechanisms
1 - 7
Slow Start
Larger window through
multiplexing
4 - 7
7
Retransmission Timeout (RTO)
Shorter RTTs → faster loss
detection
4 - 7
4
Fast Retransmit
Shorter RTTs → faster loss
detection
13
![Page 14: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/14.jpg)
Evaluating TCP Performance
Low loss, multiplexed
High loss, shorter delay
Analyzed billions of flows carrying Web traffic between
Google and clients 14
![Page 15: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/15.jpg)
Transfers With Loss Are Too Slow
Loss makes Web latency 5 times slower
Delays caused by TCP loss detection and
recovery
6% of transfers between Google and clients
are lossy
15
![Page 16: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/16.jpg)
Many Expensive Retransmission Timeouts
77% of losses are recovered by retransmission timeouts
Retransmission timeouts can be 200 times larger
than the RTT
Caused by high RTT variance, or lack of
samples
16
![Page 17: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/17.jpg)
... Caused by Tail Packet Loss
(Single) tail packet drop is very common
Tail packets are twice as likely to be dropped
compared to packets early in a burst
35% of lossy bursts observe only one packet
loss17
![Page 18: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/18.jpg)
Our Motivation and Goal
Our Goal: Approaching the ideal of loss detection and recovery without delay.
Without making the protocol too aggressive.
Loss significantly slows down transfers.Due to frequent recovery via slow RTOs.
Caused by tail loss.
18
![Page 19: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/19.jpg)
Setting FrontendServer
BackendServer
Private NetworkPublic Network
Controlling server only
Prefer solutions that do not require client changes and are compatible with middleboxes
Controlling client, server,and network
Can incur more overhead since latency-sensitive traffic is a small portion of traffic mix
19
![Page 20: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/20.jpg)
Setting FrontendServer
BackendServer
Private NetworkPublic Network
ReactiveTrigger fast retransmit by retransmitting the tail packet early
ProactiveAvoid retransmissions through packet duplication
Corrective Add redundancy to enable recovery without retransmission,or trigger fast retransmit
20
![Page 21: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/21.jpg)
Setting FrontendServer
BackendServer
Private NetworkPublic Network
ReactiveTrigger fast retransmit by retransmitting the tail packet early
ProactiveAvoid retransmissions through packet duplication
Corrective Add redundancy to enable recovery without retransmission,or trigger fast retransmit
21
![Page 22: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/22.jpg)
Reactive
Wait time until RTO
1
1 - 3 Receiver does not know about the loss and therefore cannot send
signals back
22
![Page 23: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/23.jpg)
Reactive
Wait for two RTTs
1
1 - 3
2
3
Fast retransmit
Retransmit new packet or previous
(tail) packet after two RTTs
Can trigger selective acknowledgement
indicating loss
Speeds up loss detection
23
![Page 24: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/24.jpg)
Reactive: Detecting Masked Losses
Wait for two RTTs
1 - 3
3
Cannot ignore the case where a packet loss is recovered by the Reactive probe
Count ACKs and reduce congestion window if only one ACK for tail packet
received
24
![Page 25: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/25.jpg)
Reactive: Detecting Masked Losses
Wait for two RTTs
1 - 3
3
1 - 3
3
One ACK only:Loss → Reduce
congestion window
Two ACKs:No loss
ACK 2
ACK 3
ACK 2
ACK 3ACK 3
25
![Page 26: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/26.jpg)
Setting FrontendServer
BackendServer
Private NetworkPublic Network
ReactiveTrigger fast retransmit by retransmitting the tail packet early
ProactiveAvoid retransmissions through packet duplication
Corrective Add redundancy to enable recovery without retransmission,or trigger fast retransmit
26
![Page 27: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/27.jpg)
Setting FrontendServer
BackendServer
Private NetworkPublic Network
ReactiveTrigger fast retransmit by retransmitting the tail packet early
ProactiveAvoid retransmissions through packet duplication
Corrective Add redundancy to enable recovery without retransmission,or trigger fast retransmit
27
![Page 28: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/28.jpg)
Proactive
Wait time until RTO
3
1 - 3
28
![Page 29: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/29.jpg)
Proactive
1Avoid almost all retransmissions
through packet duplication1 (DUP)
2 (DUP)
3 (DUP)
2
3 Duplicates are used if original transmission was lost
Avoids loss detection and recovery
29
![Page 30: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/30.jpg)
A/B Experiment SetupFrontend
ServerBackend
Server
Reactive Proactive
Experimented in production environmentserving billions of queries
(millions of queries are sampled)
Default Default
30
![Page 31: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/31.jpg)
Impact of Reactive and Proactive
15-day experiment, 2.6 million queries sampled:mean response time reduced by 23%
99th percentile response time reduced by 47%
Impact of Proactive:Retransmission rates on the backend connection
dropped from 0.99% to 0.09%
Impact of Reactive:Almost 50% of retransmission timeouts on the frontend
connection are converted to fast retransmits
31
![Page 32: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/32.jpg)
Corrective: The Middle Way
Reactive speeds up loss detection, but still
requires recovery
Proactive avoidsloss detection and recovery, but has100% overhead
Corrective
32
![Page 33: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/33.jpg)
Corrective: Forward Error Correction in TCP
Wait time until RTO
1
1 - 3
33
![Page 34: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/34.jpg)
Corrective: Forward Error Correction in TCP
1 - 3
ENCODED
Encodes previously transmitted segments in few coded segments
XOR coding can recover single packet loss at the receiver
Signaling of recovery status to the sender to enforce congestion
control or fast retransmitNo loss
detection required
Speeds up loss detectionand recovery
34
![Page 35: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/35.jpg)
Evaluation: Corrective
Network emulator
Synthetic workloads(fixed-size single queries)
Web page downloads(complex multi-resource queries)
35
![Page 36: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/36.jpg)
Loading nytimes.com with Corrective
Tail latency reduced by more than 20%
But: performance slightly worse on loss-
free connections
36
![Page 37: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/37.jpg)
Dealing with Middleboxes
Protocol changes need to account for middlebox interference
We designed our modules for middlebox compatibility or graceful fallback to standard TCP
37
![Page 38: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/38.jpg)
Dealing with Middleboxes
Unknown option in data packet is stripped
Require option in all packets
ACK number is rewritten for unseen sequences
Resend lost segment to update middlebox state
Modified retransmission payload is rejected
Detect tampering through checksum
38
![Page 39: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/39.jpg)
● In a measurement study analyzing billions of flows in Google’s production environment, we found that performance deteriorates drastically when encountering packet loss
● Analysis of loss patterns motivated three designs toimprove latency: Reactive, Proactive, and Corrective
● Reactive and Proactive improvedGoogle’s mean response time by 23%
● Reactive and Corrective are IETF Internet Drafts;Reactive is implemented and enabled by default in Linux 3.10
Summary: Reducing Web Latency
39
![Page 40: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/40.jpg)
What aspects of TCP are limiting Web access performance, and how can we overcome these limitations?
40
![Page 41: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/41.jpg)
Overview
A. Reducing Web Latency:The Virtue of Gentle Aggression
B. Understanding TCP Flow Performance at Scale Through Behavioral Signatures
41
![Page 42: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/42.jpg)
Understanding TCP Flow Behavior
… can be difficult, because:
● TCP’s complexity has increased dramatically over the last couple of decades, e.g. added features like:○ Window scaling○ PAWS○ Segmentation offload○ Prevention of bursty transmissions○ Early congestion indicators
● Often working with packet captures only → have to reverse engineer the protocol behavior solely based on packets observed on the wire 42
![Page 43: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/43.jpg)
Same Rtx Ratios, Different Impact
● Could observe same retransmission ratios in two scenarios but the underlying causes and the impact on performance are widely different.
● Bufferbloat: Routers use large queue buffers to absorb traffic bursts, but a single flow can fill up the buffer leading to excessive queuing delays → slow recovery in case of packet loss
● Reordering induced by packet-based load balancers: Multiple paths with different delays → out-of-order delivery causing spurious retransmissions which require time to recover and impact throughput
43
![Page 44: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/44.jpg)
Analyzing Performance Anomalies
Your site is too slow!
What’s going on in these
traces?Bufferbloat!
Investigate responsible servers and capture packet
information
Manual analysisof the packet traces
44
![Page 45: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/45.jpg)
Analyzing Performance Anomalies:Automatic, at Scale
What’s going on in these
traces?
Sample servers capture packet
information
Automatic analysis of the packet traces
45
![Page 46: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/46.jpg)
Automatic Packet Analysis
Packet Collection
Tagging
Problem Identification
● Monitoring engine capturing all packets for sample set of connections
● Use behavioral signatures to tag packets, connections, and queries
● Correlate tags with performance metrics● Manual analysis through visualization
and database queries
46
![Page 47: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/47.jpg)
Analysis Building Blocks:Behavioral Signatures
● Per packet○ Basic signatures based on header information extracted from
few packets○ Examples: acknowledgements, packet loss
● Per flow○ Aggregates data from many packets revealing behavior
observed over longer time periods○ Examples: bufferbloat, traffic policing
● Per application entity (e.g. HTTP query)○ Incorporates non-TCP information, e.g. payload content,
application-layer feedback○ Examples: query response time, video stalling 47
![Page 48: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/48.jpg)
Next Steps● Collect packets at scale on web service front-ends for
multiple weeks to record transient and persistent performance anomalies
● Identify root causes for traffic patterns associated with tail performance (e.g. tail query response time)
● Derive a set of protocol and/or network changes to address the performance problems and possibly quantify their impact
48
![Page 49: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/49.jpg)
Conclusion
● Good Web performance requires good TCP performance
● Need to understand the causes of poor tail performance enabling the design of new solutions
● Our approach:○ Automate the analysis of packet traces at scale to find
problems○ Design solutions tailored to the architectures of modern Web
services○ Deployed loss recovery mechanisms have large impact on
Google performance49
![Page 50: Understanding and Improving TCP for Web Performance](https://reader033.vdocument.in/reader033/viewer/2022041912/6255045f7972e83c6d2e62c9/html5/thumbnails/50.jpg)
Understanding and Improving TCP for Web Performance
Tobias Flachhttp://nsl.cs.usc.edu/~tobiasflach
Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao,
Ethan Katz-Bassett, Ramesh Govindan