tcp end-to-end congestion control wanida putthividhya dept. of computer science iowa state...
TRANSCRIPT
TCPTCPEnd-To-End Congestion End-To-End Congestion
ControlControl
Wanida PutthividhyaWanida Putthividhya
Dept. of Computer ScienceDept. of Computer Science
Iowa State UniversityIowa State University
Jan, 27Jan, 27thth 2002 2002
(May, 25(May, 25thth 2001) 2001)
Contents :
- TCP Congestion Control Concepts
- TCP Flavors
TCP Congestion Control
- Obey a ‘packet conservation’ principle :
“ In equilibrium, a new packet is not put into the network until an old packet leaves ”
- Avoid ‘congestion collapses’ :
“ The severe drop of the network throughput causedby the congestion ”
- A collection of collaborating mechanisms :
Slow-Start
Accurate Retransmission Timeout Estimation Congestion Avoidance
Fast Retransmit
Fast Recovery
Selective Acknowledgement
TCP Basics
- Congestion Window (cwnd) :
“ A TCP state variable that limits the amount ofdata a TCP can send”
“ The window at the sender site controlled bycongestion control and avoidance algorithms ”
- Advertised Window (Receiver Window) :
“ The available buffer size at the receiver site ”
- Sender’s maximum window (maxwin) :
“ min(cwnd, advertised window) ”
- Sender’s usable window :
“ maxwin - unacknowledged segments ”
- TCP maintains a Retransmission Timer for each packet, say x, which has been sent and not yet acknowledged.
If the ACK for the packet x does not reach the sender before its timer is expired, the packet x is assumed to be lost and the sender will retransmit the packet x.
Self-Clocking
- The ‘packet conservation’ property can be expressedin the sense that:
“ The sender will be able to inject a new data packetinto the network only if it receives an ‘ACK’ from the receiver “
So, the protocol is self-clocking !
“ The sender uses ACKs as a ‘clock’ to strobe newpackets into the network ”
- However, how is the clock started ?
The problem is :
“ An ACK is generated when the receiver receives adata packet correctly “ and“ To make the system robust, the data packet will be injected into the network only when there is an ACK triggering the sender to do so ”
- Answer:
“ A new algorithm called ‘Slow-Start’ has beenintroduced to gradually increase the amount ofdata in transit”
receiver
PrPb
Ab
sender
As Ar
Pb : the minimum packet spacing (the inter-packet interval) on the bottleneck linkPr : the receiver’s network packet spacing [Pb = Pr]Ar : the spacing between acks on the receiver’s network [if the processing time is the same for all packets, Pb = Pr = Ar] Ab : the ack spacing on the bottleneck linkAs : the ack spacing on the sender’s network [As = Pb]
Getting to Equilibrium: Slow-Start Algorithm
- When starting, initialize ‘cwnd’ to 1 When restarting after a loss, set ‘cwnd’ to 1 cwnd = 1
- Every time the sender sends data packets: min ( cwnd, advertised window) – # unacked paeket
- Upon receiving an ACK for new data, increase congestion window by one cwnd = cwnd + 1
1
one RTT
one pkt time
0R
21R
3
42R
567
83R
91011
1213
1415
1
2 3
4 5 6 7
- However, the slow-start is not that slow to increasethe congestion window of the sender site:
“ Let W be the window size (packets) Let RTT be the round-trip time it takes time RTT * log2W to open the congestion window from 1 to W ”
- Therefore, the window is increased fast enough to have negligible effect on performance
Conservation at equilibrium: round-trip timing
- Once data is flowing reliably, the problem that the sender injects a new packet before an old packet has exited must represent a failure of sender’s retransmission timer
- TCP decided to estimate the retransmission timer for each packet in term of RTT ( wait at least one RTT before retransmitting ! )
- too short RTT => unnecessary retransmission too long RTT => low throughput
- What model should be used to estimate the RTT ?
“ Estimated RTT must be adaptive due to the condition of the network, but not too fast and not too slow ”
- Initial RTO estimator:
New RTT = * old RTT + (1 - ) * M
where M : a round trip time measurement from the most recently acked data packet (Round Trip Sample) : a filter gain constant with suggested value of 0.9
RTO = * New RTT
where : accounts for RTT variation with suggested value of 2
- How to measure accurately Round Trip Samples?
A B
ACK
SampleRTT
A BOriginal transmission
retransmissionSampleRTT
Original transmission
retransmission
ACK
Acknowledgement Ambiguity phenomenon
Complication arises because TCP’s acknowledgementrefers to data received, not to the instance of aspecific datagram that carried the data
- Karn’s RTO estimator
Accounts for the Acknowledgement Ambiguity phenomenon
Combination of the initial RTO estimator and a timerback off strategy.
As usual, to compute an initial timeout value, use the formula :
New RTT = * old RTT + (1 - ) * M
RTO = * New RTT
If the timer expires and causes retransmission, TCP does not count RTT sample for that segment but keeps back-off the timeout on each retransmission by the formula :
until it can successfully transfer a segment
New RTO = * old RTO
The suggested value for is 2
- Jakobson’s RTO estimator
Key Observations:
At high load, there is a wide range of variationin delay
Queuing theory suggested that by using the formula
and limiting to the suggested value of 2, the RTO estimation can adapt to loads of at most30 %
RTO = * New RTT
DIFF = SAMPLE - old RTTSmoothed RTT = old RTT + * DIFFDEV = old DEV + * ( |DIFF| - old DEV )Timeout = Smoothed RTT + * DEV
Solutions:
Estimate both average round trip time and the variance, and use the estimated variance in place ofthe constant
where DEV : the estimated mean deviation : a fraction between o and 1 that controls how quickly the new sample affects the weighted average (Smoothed RTT) : a fraction between o and 1 that controls how quickly the new sample affects the mean deviation : a factor that controls how much the deviation affects the RTO (suggested value of is 4)
Adapting to the path: Congestion Avoidance
- Use coarse grained timeout to indicate congestion in the network
- If loss occurs (timeout) when cwnd = W The network can absorb up to W segments
Set cwnd to 0.5 * W (multiplicative decrease)
- Upon receiving an ACK, Increase cwnd by 1/cwnd (additive increase)
Review:Congestion control algorithms must obey the “ Packet Conservation Principle ”.
* to get to the equilibrium state, to get high utilization of the network BW, but not want to bomb the network with a big burst,
USE ‘SLOWSTART’ algorithm
* to maintain the equilibrium state (not inject a new packet into the network until an old packet has been taken out),
USE an unambiguous situation to measure RTT (Karn’s algorithm) & USE an accurate model to calculate RTO (Jacobson’s model)
* to adapt to the network condition,
USE a mechanism to detect occurring of loss (coarse-grained timeout) USE congestion avoidance to avoid exceeding the available BW
The combined slow-start with congestionavoidance algorithm
- Use 2 state variables :cwnd : the congestion window at the sender site
ssthresh : the threshold used to switch between the two algorithms
- The sender always sends min(cwnd, advertised window) - # unacked packet
- If a packet is dropped, we loss self-clocking
- We need to implement both algorithms together to avoid loosing a packet as much as we can.
- The algorithm starts with slow-start; on a timeout, ssthresh = cwnd/2
cwnd = 1
- Now, upon receiving an ACK
if (cwnd < ssthresh) cwnd += 1 ; /* implement slow-start */ else
cwnd += 1/cwnd ; /* implement congestion avoidance */
Slow-Start and Congestion AvoidanceSlow-Start and Congestion Avoidance
SENDER RECEIVER
PKT#0
ACK #0, wait for #1
#1#2
#3#4#5#6
#7#8#9#10
#11#12#13#14
(1)
(2)
(4)
(8)
(14)ACK #12
dup ACK #12
. . .
SENDER RECEIVER
#15
#26
#16
. . .Timeout
ssthresh = 15/2 = 7 ( cwnd = 1 ) “start slow-start again”
Retx #13
ACK #26, wait for #27
#27#28
(2)
(4)
SENDER RECEIVERSENDER RECEIVER
(7)
#29#30#31#32
“enter congestion avoidance”
#33#34#35#36#37#38#39
(8)
#40#41#42#43#44#45#46
(8.125)
#47. . .
Timeout ssthresh = 8/2 = 4 ( cwnd = 1 ) “start slow-start again”
Retx #41
ACK #47, wait for #48
#48#49
(2)
(4)“enter congestion avoidance”
. . .
The congestion window for The congestion window for slow-start/congestion avoidance algorithmslow-start/congestion avoidance algorithm
time
Congestionwindow
1
W1
0.5 W1
W2
0.5 W2
Timeout Timeout
Impacts of timeout
- Timeout can cause sender to: Slow-start Retransmit a portion of window (possibly large)
- Employ duplicate ACKs to signal the sender
Fast Retransmit : use a number of duplicate ACKs tosignal the sender about the packet loss (shorten theidle time for waiting for the timeout)
Fast Recovery : advance congestion window moreaggressively to reach high utilization faster
Fast Retransmit
- Duplicate ACKs can be caused by:
Segment Dropped
Segment Re-ordering
- TCP receiver should send an immediate duplicate ACK when an out-of-order segment arrives
- TCP receiver should send an immediate ACK when an incoming segment fills in all or part of a gap in the sequence space.
- Assume that segment re-ordering is infrequent,
TCP sender uses receipt of 3 duplicate ACKs asan indication of a segment has been lost
“3 duplicate ACKs” means 4 identical ACKs withoutthe arrival of any other intervening ACK packets
Set ssthresh = 0.5 * current cwnd, cwnd = 1, and retransmit the dropped segment before timeout
Wait for a non-duplicate ACK and continue with slow-start
- Fast Retransmit removes the idle time the sender waits for the coarse grained timeout, since the sender can retransmit the dropped segment upon receiving the third duplicate ACK
- However, the throughput of the system is still suffered from the fact that the sender has to enter slow-start every time a retransmission occurs
- Moreover, Fast Retransmit causes unnecessary retransmission when multiple drops in a single window occur
Fast Recovery
- Key Observation:
A duplicate ACK is caused by a receipt of a segment at the receiver site
In another word, each duplicate ACK corresponds totaking one segment out of the network
So, it is possible to use the duplicate ACKs to clockthe sending of segments
- Solution:
If n duplicate ACKs arrive at the sender, advancecwnd by n
Fast Retransmit & Fast Recovery
- Upon receiving the third duplicate ACK of segment X,
Retransmit segment N (Fast Retransmit)
Set ssthresh = 0.5 * current cwnd
Set cwnd = ssthresh + 3 (Fast Recovery)
- After that, upon receiving a duplicate ACK, inflate the congestion window by one
- If the sender’s usable window allows, send new data segment
- Upon receiving a non-duplicate ACK, exit Fast Recovery
Set cwnd = ssthresh (the value in step 1)and continue with congestion avoidance
- Fast Recovery helps enhancing the throughput of the system reasonably since duplicate ACKs are used to clock sending(s)
- However, it is suffered a lot if multiple drops in a single window occur. The throughput is dramatically dropped especially when there are 3 non-consecutive drops in a window
Modified Fast Recovery (Conservative version)
- Key Observation:
Fast Recovery is suffered from multiple drops sinceit has to enter Fast Recovery several times
- Solution:
Change the sender’s behavior during Fast Recoverywhen a partial ACK is received
A partial ACK is the one that acknowledges some butnot all of the segments that were outstanding at thestart of the Fast Recovery period
In the original Fast Recovery, partial ACKs causeTCP sender to exit Fast Recovery by deflating thecongestion window back to the size of ssthresh
In the modified Fast Recovery, partial ACKs do nottake TCP sender out of Fast Recovery
Instead, partial ACKs received during Fast Recoverytrigger the sender to retransmit the segment immediately following the acknowledged segment
TCP sender remains in Fast Recovery until all of thedata outstanding when Fast Recovery was initiatedhas been acknowledged
Selective Acknowledgement (SACK)
- TCP receiver provides more information about hole(s) in the sequence buffer to the sender
- The SACK option field contains a number of SACK blocks, where each SACK block reports a non-contiguous set of data that has been received and queued.
The 1st block is required to report the most recentlyreceived segment
The additional SACK blocks repeat the most recentlyreported SACK blocks
- The minimum number of SACK blocks in the SACK option field is two. It can have more than two blocks depending on the other option fields implemented in TCP.
- The simulation referenced by this presentation used assumed to have three blocks in the SACK option field
- SACK TCP Sender enters Fast Recovery upon receiving 3rd duplicate ACK of a certain segment. Like the regular Fast Recovery, the sender cuts cwnd are cut in half and retransmit the dropped segment
- During Fast Recovery, SACK maintains a variable, named ‘pipe’, representing the estimated number of segments outstanding in the path
- The sender also maintains a data structure, called ‘scoreboard’ , which remembers acknowledgements from previous SACK options
- The sender only sends new or retransmitted data when “pipe < cwnd”
- ‘pipe’ is incremented by one when the sender either sends a new segment or retransmits an old packet
- ‘pipe’ is decremented by one when the sender receives a dup ACK packet with a SACK option reporting that new data has been received at the receiver
- Upon receiving a partial ACK, ‘pipe’ is decremented by two
- The sender exits Fast Recovery when it receives a recovery acknowledgement acknowledging all data that was outstanding when it enters Fast Recovery
- When the sender is allowed to send a segment,
It retransmits the next segment inferred to be missing
If no such segments and the advertised window issufficiently large, the sender sends a new packet
- When the retransmitted packet is itself dropped, the TCP sender detects drop with RTO, retransmits the dropped segment and then slow-starts.
TCP Flavors
- Tahoe, Reno, New-Reno, Vegas
- TCP Tahoe (distributed with 4.3 BSD Unix) includes:
Slow-start (exponential increase congestion window)
Congestion Avoidance (additive increase)
Fast Retransmit (use 3 dup ACKs)
- TCP Reno (1990) includes :
All mechanisms in Tahoe
Fast Recovery ( governing the transmission afterretransmit the lost segment )
Delayed Acknowledgement ( to avoid silly windowsyndrome )
- TCP New Reno :
Makes a small change in responding to partial ACKs during Fast Recovery
Tahoe: 1 dropTahoe: 1 drop
SENDER RECEIVER
#0(1)
(2)#1#2
ACK #1 - #2(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13“enter fastretransmit”ssthresh = 15/2 = 7(cwnd = 1)“continue withslow-start”
. . .
14th dup ACK #13
Retx #13
ACK #28
(2) #29#30
ACK #29 - #30(4)
#31#32#33#34
ACK #31 - #34(7)
SENDER RECEIVER
“enter congestion avoidance” #35
#36#37#38#39#40#41
ACK #35 - #41(8) . . .
Reno : 1 dropReno : 1 dropSENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
ACK #28
3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)
4th dup ACK #13
(11) 5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
13th dup ACK #13
14th dup ACK #13
#29#30
#31#32#33#34
“exit fastrecovery”ssthresh = 7(cwnd = 7)continue with congestion avoidance !
#35
SENDER RECEIVER
ACK #29 - #35(8)
#36#37#38#39#40#41#42#43
ACK #36 - #43(9) . . .
Tahoe: 2 drops Tahoe: 2 drops
SENDER RECEIVER
#0(1)
(2)#1#2
ACK #1 - #2(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
SENDER RECEIVER
“enter fastretransmit”ssthresh = 8/2 = 4(cwnd = 1)continue withslow-start
3 dup ACKs #6
6th dup ACK #13
. . .Retx #7
ACK #8
(2)#9 (retx)
ACK #14
#15#16
. . .
#10
1st dup ACK #14(3)
#17
ACK #15 - #17(4.67)“enter
congestion avoidance”
Reno : 2 drops (causing “retransmission timeout”)Reno : 2 drops (causing “retransmission timeout”)
SENDER RECEIVERSENDER RECEIVER
#0(1)
(2)#1#2
ACK #1 - #2(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
SENDER RECEIVER
“enter fastrecovery”ssthresh = 8/2 = 4(cwnd = 4)
3 dup ACKs #6
6th dup ACK #6 Retx #7
ACK #8#15 #16
1st dup ACK #8
#17#18
ACK #17 - #18(4)
(10)“exit fastrecovery”ssthresh = 4(cwnd = 4)cannot send moredata since theoutstanding no.of segments is 8 2nd dup ACK #8
. . .Timeout
Retx #9
“enter slow-start”
(cwnd = 1)
ACK #16
(2)
4th dup ACK #6
(8) 5th dup ACK #6
(9)
Reno : 2 drops (causing “two successive Fast Recovery”)Reno : 2 drops (causing “two successive Fast Recovery”)
SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)
4th dup ACK #13
(11) 5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
(12)(13)(14)(15)(16)(17)(18)(19)(20)
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
13th dup ACK #13
Retx#14
#29#30#31#32
#33
ACK#27
“exit fastrecovery”ssthresh = 7(cwnd = 7) #34
SENDER RECEIVER
3 dup ACKs #27
4th dup ACK #27
5th dup ACK #27
“enter fastrecovery”ssthresh = 7/2 = 3(cwnd = 3) (7)
(8) 6th dup ACK #27
(9)
#35 #36
Retx#28
ACK#34“exit fastrecovery”ssthresh = 3(cwnd = 3)continue withcongestionavoidance
ACK#35#37
#38ACK#36
#39ACK#37
(4) #40#41
#42
ACK#38
ACK#39
#43ACK#40
#44ACK#41
#45(5)
New Reno : 2 dropsNew Reno : 2 drops
SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)
4th dup ACK #13
(11) 5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
(12)(13)(14)(15)(16)(17)(18)(19)(20)
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
13th dup ACK #13
Retx#14
#29#30#31#32
#33
ACK#27“receive a partialACK; retransmitsegment#28 immediately”
SENDER RECEIVER
Retx#28(7)
(8)(9)(10)(11)(12)
5 dup ACKs #27#34
#35#36
#37#38#39
ACK#33“exit fastrecovery”ssthresh = 7(cwnd = 7)continue withcongestionavoidance
SACK TCP : 2 dropsSACK TCP : 2 drops
SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13
“enter Fast Recovery”pipe = cwnd - ndup = 15 - 3 = 12ssthresh = 15/2 = 7cwnd = 7 4th dup ACK #13
5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
13th dup ACK #13
(7, 11) (7, 10)(7, 9)(7, 8)(7, 7)(7, 6)(7, 5)(7, 4)(7, 3)(7, 2)
Retx#14
Can send fivemore segments
SENDER RECEIVER
ACK#27
#29#30#31#32#33
(7, 3)(7, 4)(7, 5)(7, 6)(7, 7)
#34#35
(7, 5)(7, 6)(7, 7)
5 dup ACKs #27(7, 6)(7, 5)(7, 4)(7, 3)
(7, 2)Retx#28#36
#37#38#39
(7, 7)
2 dup ACKs #27
(7, 6)
(7, 5)#40#41
(7, 7)
ACK#35
“exit fastrecovery”ssthresh = 7(cwnd = 7)continue with congestion avoidance
SENDER RECEIVER
#42
ACK#36
ACK#37
ACK#38
ACK#39
#43#44#45#46
ACK#40
ACK#41
#47#48
ACK#42
#49(8)
#50#51#52#53#54
ACK#43
ACK#44
ACK#45
ACK#46
Example: SACK 2 drops (#14 and #28)
At sender:
Receive ACK# 7 No Gap7 0-6
ACK# 8 No Gap8 0-7
ACK# 9 No Gap9 0-8
ACK#10 No Gap10 0-9
ACK#11 No Gap11 0-10
ACK#12 No Gap12 0-11
ACK#13 No Gap13 0-12
1st dup a hole at #14ACK#13
15 0-13
2nd dup a hole at #14ACK#13
16 0-13 15
3rd dup a hole at #14ACK#13
17 0-13 15-16 *** Enter Fast Recovery ! ssthresh = cwnd = 15/2 = 7 outstanding segment = #14 - #28 Retransmit #14
13th dup a hole at #14ACK#13
27 0-13 15-26
. . .
ACK#27 No Gap *** The first partial ACK is caused by retransmitted segment #14 ‘pipe’ is decremented by two
27 0-27
1st dup a hole at #28ACK#27
29 0-27
2nd dup a hole at #28ACK#27
30 0-27 29
3rd dup a hole at #28ACK#13
31 0-27 29-30
4th dup a hole at #28ACK#13
32 0-27 29-31
7th dup a hole at #28ACK#13
35 0-27 29-34
. . .
Retransmit #28
ACK#35 No Gap *** The recovery ACK is caused by retransmitted segment #28 It brings TCP sender out of Fast Recovery
*** Exit Fast Recovery ! ssthresh = cwnd = 15/2 = 7 continue with congestion avoidance
35 0-34
Tahoe: 3 drops Tahoe: 3 drops
SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
#15#16#17#18
#26#27#28
. . .
SENDER RECEIVER
“enter fast retransmit”ssthresh = 15/2 = 7(cwnd = 1) continue with slow-start
12th dup ACK #13
Retx#14
ACK #25
#26 (retx)#27
(2)
(3)
ACK #27
1st dup ACK #27
#28 (retx)#29#30 ACK #28
ACK #29
ACK #30(4)(5)(6)
SENDER RECEIVER
#31#32#33#34#35#36
ACK #31 - #36(7)
#37#38#39#40#41#42
“enter congestionavoidance”
Reno : 3 drops Reno : 3 drops SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)
4th dup ACK #13
(11) 5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
(12)(13)(14)(15)(16)(17)(18)(19)
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
Retx#14
#29#30#31#32
ACK#25“exit fast recovery”ssthresh = 7(cwnd = 7)continue withcongestion avoidance
SENDER RECEIVER
3 dup ACKs #25
“enter fastrecovery”ssthresh = 7/2 = 3(cwnd = 3)
4th dup ACK #25
(7) Retx#26
ACK#27“exit fast recovery” ssthresh = 3 (cwnd = 3)continue withcongestion avoidance
. . .
Timeout“enterslow-start”(cwnd = 1)
Retx#28
ACK#32(2) #33
#34
ACK#33 - #34(3)continue withcongestion avoidance
New Reno : 3 drops New Reno : 3 drops SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)
4th dup ACK #13
(11) 5th dup ACK #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
(12)(13)(14)(15)(16)(17)(18)(19)
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
Retx#14
#29#30#31#32
ACK#25“receive a partial Acknowledgement”retransmit #26immediately
SENDER RECEIVER
Retx#26(7)
(8)(9)
(10)(11) #33
#34#35#36
4 dup ACKs #25
“receive a partial Acknowledgement”retransmit #28immediately
(7)4 dup ACKs #27
(8)(9)
(10)(11) #37
#38
Retx#28
ACK#27
ACK#36“exit fastrecovery”ssthresh = 7(cwnd = 7)
#39#40#41#42#43
SACK TCP : 3 drops SACK TCP : 3 drops SENDER RECEIVER
#0
#1#2
ACK #1 - #2
(1)
(2)
(4)
#3#4#5#6
ACK #3 - #6(8)
#7#8#9#10#11#12#13#14
ACK #7 - #13(15)
SENDER RECEIVER
#15#16#17#18
#26#27#28
. . .
“enter Fast Recovery”pipe = cwnd - ndup = 15 - 3 = 12ssthresh = 15/2 = 7cwnd = 7 4th dup ACK #13
5th dup ACK #13
3 dup ACKs #13
6th dup ACK #13
7th dup ACK #13
8th dup ACK #13
9th dup ACK #13
10th dup ACK #13
11th dup ACK #13
12th dup ACK #13
(7, 11) (7, 10)(7, 9)(7, 8)(7, 7)(7, 6)(7, 5)(7, 4)(7, 3)
Retx#14
Realize that #26has been lost, andright now we cansend 4 segments
SENDER RECEIVER
#29#30#31
Retx #26
(7, 4)(7, 5)(7, 6)
(7, 7)
ACK #25
The first partialACK (7, 5) #32
#33 (7, 7)
3 dup ACKs #25
(7, 6) (7, 5) (7, 4)
These 3 dup ACKscontain informationindicating holesat segment #26 and#28.
Retx #28#34#35
ACK #27
The second partialACK
(7, 5)
(7, 7)
#36#37(7, 7)
(7, 5)
2 dup ACKs #27
#38#39
ACK #33
“exit Fast Recovery” ssthresh = 7 cwnd = 7continue with congestion avoidance
- TCP Vegas (1995) implements 3 new techniques to increase throughput and decrease losses :
New retransmission mechanism
Congestion avoidance mechanism
Modified Slow-Start mechanism
to avoid packet losses while trying to find the available bandwidth during the initial use of slow-start
give TCP the ability to anticipate congestion, andadjust its transmission rate accordingly
Results in a more timely decision to retransmita dropped segment
TCP Vegas New Retransmission Mechanism TCP Vegas New Retransmission Mechanism
- Vegas reads and records the system clock each time a segment is sent
- When an ACK arrives, Vegas reads the arriving time again and does the RTT calculation
RTT = Segment sending time - ACK arriving time
Goals: 1. To be able to detect lost segments even though there may be no second or third duplicate ACK
2. To reduce the time to detect lost segments ( can retransmit before receiving the third duplicate ACK )
When a duplicate ACK #n is received,
Vegas checks the difference between the currenttime and the sending time of the segment #n+1. If it is greater than the timeout value, Vegas retransmits the segment #n+1 without having to wait for 3 duplicate ACKS
When a non-duplicate ACK #n is received and it isthe first or second one after a retransmission,
Vegas checks the difference between the currenttime and the sending time of segment #n+1. If itis greater than the timeout value, Vegas retransmits segment #n+1 without having to wait for 3 duplicate ACKS
- Vegas then uses this more accurate RTT estimate to decide to retransmit in the following two situations :
- In addition to being able to detect lost segment sooner than the original TCP Reno,
the congestion window in TCP Vegas is decreased due to only losses that happened at the current sending rate, and not due to losses that happenedat an earlier, higher rate
- This concept is also implemented in TCP New-Reno where any partial ACKs do not bring TCP sender out of Fast Recovery
Vegas : retransmit mechanism (diagram) Vegas : retransmit mechanism (diagram) SENDER RECEIVER
#0
#1#2
#3#4#5#6
#7#8#9#10#11#12#13#14
ACK#7
ACK#8
ACK#9
ACK#10
ACK#11
ACK#12
#151 R
TT
1st dup ACK#12
: ACK#13 is expected
: This is the 1st dup ACK #12 (due to #14) Vegas checks the sending time of the segment #13 and decides to retransmit it. The congestion window is also reduced by half 14/2 = 7
Retx #13
ACK#14
: This is the 1st ACK after retransmission Vegas checks timestamp of #15 and decides to retransmit it The congestion window is not reduced by half since the loss happens before the last window decreases ( we know because it is a partial ACK ). Such a loss does not imply that the network is congested for the current congestion window size, and therefore, does not imply that it should be decreased again
1 R
TT
: ACK#15 is expected
TCP Vegas Congestion Avoidance Mechanism TCP Vegas Congestion Avoidance Mechanism
- It uses the loss of segments as a signal of network congestion
- It is reactive, rather than proactive since it cannot detect the incipient stage of congestion and prevent it (before losses occur)
Review of TCP Reno’s congestion detection and controlmechanism :
- As a result, Reno needs to create losses to find the available bandwidth of the connection
Several proactive algorithms :
- Based on the fact that as the network approaches congestion, the queue size in intermediate node is increased, resulting in increasing of the RTT for each successive segment :
Wang and Crowcroft’s DUAL algorithm
Jain’s CARD (Congestion Avoidance usingRound-Trip Delay)
- Based on the fact that as the network approaches congestion, the sending rate is flattening :
Wang and Crowcroft’s Tri-S scheme
Vegas’s congestion avoidance actions :
- Generally, Vegas measures and controls the right amount of extra data the connection has in transit
- Extra data mean data that would not have been sent if the bandwidth used by the connection exactly matched the available bandwidth of the network
- Too much extra data : congestion Too little extra data : cannot respond rapidly enough to transient increases in the available network bandwidth
- Based on changes in the estimated amount of extra data in the network, not only dropped segments
- BaseRTT mean the RTT of a segment when the connection is not congested
In practice, Vegas sets BaseRTT to the minimumof all measured round trip times
- Assumed that the connection is not overflowing, the Expected throughput can be given by :
Expected = WindowSize / BaseRTT
where WindowSize is the size of the current congestion window (assumed to be the number of bytes in transit)
- Once per round-trip time, Actual sending rate is calculated :
Computing the RTT for the distinguished segmentwhen its acknowledgement arrives, and dividing the number of bytes transmitted by the sampleRTT
- Compare Actual to Expected :
Diff = Expected - Actual
Since, Expected >= Actual (from the definition), Diff is positive or zero
- Define two thresholds : ( in terms of KB/s )
- Both thresholds represent the lower bound and the upper bound of extra data for a connection
- In practice, during congestion avoidance, we express the two thresholds in terms of buffers rather than extra bytes in transit
is set to 1 is set to 3
These values can be interpreted as : TCP sender should try to use at least one extra buffer at the bottleneck router, but no more than three extra buffers
Diff : leaves the congestion window unchanged
Diff : decreases the congestion window linearly during the next RTT
The farther away the actual throughput gets from the expected throughput, the morecongestion there is in the network
- Diff : increases the congestion window linearly during the next RTT
The closer the actual throughput and the expected throughput, the more the network is indanger of not utilizing the available bandwidth
TCP Vegas Modified Slow-Start Mechanism TCP Vegas Modified Slow-Start Mechanism
- Slow-Start in TCP Reno :
TCP is a “self-clocking” protocol
It uses ACKs as a “clock” to strobe new segments into the network
At the beginning of a connection or after aretransmit timeout, Slow-Start is used togradually increase the amount of data in transit(the size of congestion window -- cwnd)
The Slow-Start period ends when the exponentially increasing congestion window reaches the threshold window -- ssthresh
Once a retransmit timeout occurs, ssthresh is setto one half of the current cwnd
However, when the connection starts, there isno idea how much ssthresh should be initialized to
Too small initial ssthresh : throughput suffersToo large initial ssthresh : losses occur
For Reno, the initial ssthresh is set to a very highvalue. TCP sender is blindly in the slow-start phaseuntil a retransmit timeout occurs (timeout meanssegment losses)
At that time, TCP sender has some idea aboutthe available bandwidth of the connection
- Modified Slow-Start in TCP Vegas :
Find a connection’s available bandwidth withoutallowing losses during the initial slow-start
Every other RTT, exponential growth is allowed
In between, the congestion window stays fixedand the comparison of the expected and actualrates is made
When “Expected - Actual == 1”, Vegas switch fromSlow-Start to linear/decrease mode
Incorporate the congestion detection mechanisminto slow-start
SENDER RECEIVER
#0(1)
(2)
Vegas : modified slow-start mechanism (diagram) Vegas : modified slow-start mechanism (diagram)
Comparison is made
#1#2
#3#4
(4)
Exponential growth
Exponential growth#5#6#7#8
Comparison is made
. . .