3rd edition: chapter 3hbinsky/intro comp comm/3_transport_layer.pdf · passes to network layer rcv...
TRANSCRIPT
1
Transport Layer
By Ossi Mokryn and Hadar Binsky Based on slides fromthe Computer Networking
A Top Down Approach Featuring the Internet by Kurose and Ross also by Jennifer Rexford Princeton
And on data from beej‟s guide httpbeejusguidebgnet
Transport Layer
Connectionless and connection oriented communication
Sockets programming
UDP
TCP Reliable communication
Flow control
Congestion control
Timers
2Transport Layer
2
Transport services and protocols
provide logical communicationbetween app processes running on different hosts
transport protocols run in end systems
send side breaks app messages into segments passes to network layer
rcv side reassembles segments into messages passes to app layer
more than one transport protocol available to apps
Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
Transport Layer 3
Internet transport-layer protocols
reliable in-order delivery (TCP) congestion control
flow control
connection setup
unreliable unordered delivery UDP no-frills extension of
ldquobest-effortrdquo IP
services not available delay guarantees
bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
Transport Layer 4
3
Transport vs network layer
network layer logical communication between hosts
transport layer logical communication between processes relies on enhances
network layer services
Household analogy
12 kids sending letters to 12 kids
processes = kids
app messages = letters in envelopes
hosts = houses
transport protocol = Ann and Bill
network-layer protocol = postal service
Transport Layer 5
Multiplexingdemultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Transport Layer 6
4
How demultiplexing works host receives IP datagrams
each datagram has source IP address destination IP address
each datagram carries 1 transport-layer segment
each segment has source destination port number
host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
Transport Layer 7
Connectionless demultiplexing
Create sockets with port numbers
UDP socket identified by two-tuple
(dest IP address dest port number)
When host receives UDP segment checks destination port
number in segment
directs UDP segment to socket with that port number
IP datagrams with different source IP addresses andor source port numbers directed to same socket
Transport Layer 8
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
clientIP A
P1P1P3
serverIP C
SP 6428
DP 9157
SP 9157
DP 6428
SP 6428
DP 5775
SP 5775
DP 6428
SP provides ldquoreturn addressrdquo
Transport Layer 9
UDP User Datagram Protocol [RFC 768]
Simplest Internet transport protocol
Each app Output produces exactly one UDP segment
ldquobest effortrdquo service UDP segments may be
lost
delivered out of order to app
connectionless
no handshaking between UDP sender receiver
each UDP segment handled independently of others
Why is there a UDP no connection
establishment (which can add delay)
simple no connection state at sender receiver
small segment header
no congestion control UDP can blast away as fast as desired
Transport Layer 10
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
2
Transport services and protocols
provide logical communicationbetween app processes running on different hosts
transport protocols run in end systems
send side breaks app messages into segments passes to network layer
rcv side reassembles segments into messages passes to app layer
more than one transport protocol available to apps
Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
Transport Layer 3
Internet transport-layer protocols
reliable in-order delivery (TCP) congestion control
flow control
connection setup
unreliable unordered delivery UDP no-frills extension of
ldquobest-effortrdquo IP
services not available delay guarantees
bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
Transport Layer 4
3
Transport vs network layer
network layer logical communication between hosts
transport layer logical communication between processes relies on enhances
network layer services
Household analogy
12 kids sending letters to 12 kids
processes = kids
app messages = letters in envelopes
hosts = houses
transport protocol = Ann and Bill
network-layer protocol = postal service
Transport Layer 5
Multiplexingdemultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Transport Layer 6
4
How demultiplexing works host receives IP datagrams
each datagram has source IP address destination IP address
each datagram carries 1 transport-layer segment
each segment has source destination port number
host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
Transport Layer 7
Connectionless demultiplexing
Create sockets with port numbers
UDP socket identified by two-tuple
(dest IP address dest port number)
When host receives UDP segment checks destination port
number in segment
directs UDP segment to socket with that port number
IP datagrams with different source IP addresses andor source port numbers directed to same socket
Transport Layer 8
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
clientIP A
P1P1P3
serverIP C
SP 6428
DP 9157
SP 9157
DP 6428
SP 6428
DP 5775
SP 5775
DP 6428
SP provides ldquoreturn addressrdquo
Transport Layer 9
UDP User Datagram Protocol [RFC 768]
Simplest Internet transport protocol
Each app Output produces exactly one UDP segment
ldquobest effortrdquo service UDP segments may be
lost
delivered out of order to app
connectionless
no handshaking between UDP sender receiver
each UDP segment handled independently of others
Why is there a UDP no connection
establishment (which can add delay)
simple no connection state at sender receiver
small segment header
no congestion control UDP can blast away as fast as desired
Transport Layer 10
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
3
Transport vs network layer
network layer logical communication between hosts
transport layer logical communication between processes relies on enhances
network layer services
Household analogy
12 kids sending letters to 12 kids
processes = kids
app messages = letters in envelopes
hosts = houses
transport protocol = Ann and Bill
network-layer protocol = postal service
Transport Layer 5
Multiplexingdemultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Transport Layer 6
4
How demultiplexing works host receives IP datagrams
each datagram has source IP address destination IP address
each datagram carries 1 transport-layer segment
each segment has source destination port number
host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
Transport Layer 7
Connectionless demultiplexing
Create sockets with port numbers
UDP socket identified by two-tuple
(dest IP address dest port number)
When host receives UDP segment checks destination port
number in segment
directs UDP segment to socket with that port number
IP datagrams with different source IP addresses andor source port numbers directed to same socket
Transport Layer 8
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
clientIP A
P1P1P3
serverIP C
SP 6428
DP 9157
SP 9157
DP 6428
SP 6428
DP 5775
SP 5775
DP 6428
SP provides ldquoreturn addressrdquo
Transport Layer 9
UDP User Datagram Protocol [RFC 768]
Simplest Internet transport protocol
Each app Output produces exactly one UDP segment
ldquobest effortrdquo service UDP segments may be
lost
delivered out of order to app
connectionless
no handshaking between UDP sender receiver
each UDP segment handled independently of others
Why is there a UDP no connection
establishment (which can add delay)
simple no connection state at sender receiver
small segment header
no congestion control UDP can blast away as fast as desired
Transport Layer 10
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
4
How demultiplexing works host receives IP datagrams
each datagram has source IP address destination IP address
each datagram carries 1 transport-layer segment
each segment has source destination port number
host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
Transport Layer 7
Connectionless demultiplexing
Create sockets with port numbers
UDP socket identified by two-tuple
(dest IP address dest port number)
When host receives UDP segment checks destination port
number in segment
directs UDP segment to socket with that port number
IP datagrams with different source IP addresses andor source port numbers directed to same socket
Transport Layer 8
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
clientIP A
P1P1P3
serverIP C
SP 6428
DP 9157
SP 9157
DP 6428
SP 6428
DP 5775
SP 5775
DP 6428
SP provides ldquoreturn addressrdquo
Transport Layer 9
UDP User Datagram Protocol [RFC 768]
Simplest Internet transport protocol
Each app Output produces exactly one UDP segment
ldquobest effortrdquo service UDP segments may be
lost
delivered out of order to app
connectionless
no handshaking between UDP sender receiver
each UDP segment handled independently of others
Why is there a UDP no connection
establishment (which can add delay)
simple no connection state at sender receiver
small segment header
no congestion control UDP can blast away as fast as desired
Transport Layer 10
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
clientIP A
P1P1P3
serverIP C
SP 6428
DP 9157
SP 9157
DP 6428
SP 6428
DP 5775
SP 5775
DP 6428
SP provides ldquoreturn addressrdquo
Transport Layer 9
UDP User Datagram Protocol [RFC 768]
Simplest Internet transport protocol
Each app Output produces exactly one UDP segment
ldquobest effortrdquo service UDP segments may be
lost
delivered out of order to app
connectionless
no handshaking between UDP sender receiver
each UDP segment handled independently of others
Why is there a UDP no connection
establishment (which can add delay)
simple no connection state at sender receiver
small segment header
no congestion control UDP can blast away as fast as desired
Transport Layer 10
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
6
UDP more
often used for streaming multimedia apps
loss tolerant
rate sensitive
other UDP uses DNS
SNMP
reliable transfer over UDP add reliability at application layer
application-specific error recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header and dataMinimum value is 8
Bytes
Transport Layer 11
UDP checksum
Sender treat segment contents
as sequence of 16-bit integers
checksum addition (1‟s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver compute checksum of
received segment
check if computed checksum equals checksum field value
NO - error detected
YES - no error detected But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Transport Layer 12
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
7
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
Transport Layer 13
Sockets Programming over UDP ndash use socket slides now
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
8
Transmission Control Protocol
Principles of reliable communication
TCP basic notations 3 way handshake
TCP flow control congestion control
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 16
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
9
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 17
Principles of Reliable data transfer
important in app transport link layers
top-10 list of important networking topics
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Transport Layer 18
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
10
Reliable Data Transfer Stream stream jargon
Application Layer 19
A stream is a sequence of characters that flow into or out of a process
An input stream is attached to some input source for the process eg keyboard or socket
An output stream is attached to an output source eg monitor or socket
Reliable Communication
20Transport Layer
state1
state2
event causing state transitionactions taken on state transition
eventactions
Terminology of a State Machine
Note The state machine is for a certain host
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
11
Reliable Communication
First Model sender sends receiver receives
Is this enough
When will it work
When will it not work
Transport Layer 21
Sender sends one packet then waits for receiver response
stop and wait
Reliable Communication
22Transport Layer
State1
state2
Send Data
Data available
Stop and Wait ndash Sender side
Received AckIn State 1
Sender can send data
In State 2Sender can receive acknowledge packetsDiscussion
bull What does the receiver dobull What happens if data is available at state 2
Wait
for dataWait
for ack
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
12
channel with bit errors and losses
underlying channel may flip bits in packet checksum to detect bit errors
underlying channel can also lose packets the question how to recover from errors
acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK
timeout sender retransmits pkt if doesn‟t receive ack within timeout
new mechanisms in error detection
receiver feedback control msg (ACK) rcvr-gtsender
sender control timer to understand if to send again
Transport Layer 23
Reliable Communication
24Transport Layer
State1
state2a
Send Data
Data available
Stop and Wait with errorslosses - sender
Ack receivedIn State 1Sender waits
for data
Waitfor data
Waitfor ack packet or time out
state2b
timeout
Send
Dat
a
Discussion
bull What does the sender need to do for the retransmission
Process ack
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
13
This version has a fatal flaw
What happens if ACK corruptedlost
sender doesn‟t know what happened at receiver
can‟t just retransmit possible duplicate
Handling duplicates sender retransmits current
pkt if ACK garbled or didn‟t arrive
sender adds sequence number to each pkt
receiver discards (doesn‟t deliver up) duplicate pkt
receiver must specify seq of pkt being ACKed
Transport Layer 25
discussionSender
seq added to pkt
two seq ‟s (01) will suffice Why
must check if received ACK corrupted
twice as many states state must ldquorememberrdquo
whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiver
must check if received packet is duplicate state indicates whether 0 or
1 is expected pkt seq
receiver sends ACK for last pkt received OK receiver must explicitly
include seq of pkt being ACKed
note receiver can not know if its last ACK received OK at sender
Transport Layer 26
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
14
Stop amp wait in action
Transport Layer 27
Stop amp wait in action
Transport Layer 28
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
15
Performance of stop amp wait
Stop amp wait works but performance stinks
ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link
network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
Transport Layer 29
stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
U sender
= 008
30008 = 000027
microsec
onds
L R
RTT + L R =
Transport Layer 30
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
16
Pipelined protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased
buffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 31
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arrives
last packet bit arrives send ACK
ACK arrives send next
packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microsecon
ds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
Transport Layer 32
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
17
Pipelining Protocol
Transport Layer 33
Go-back-N big picture Sender can have up to N unacked packets in
pipeline Rcvr only sends cumulative acks
Doesn‟t ack packet if there‟s a gap
Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets
Transport Layer 3-34
Go-Back-NSender k-bit seq in pkt header
ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo
may receive duplicate ACKs (see receiver)
timer for each in-flight pkt
timeout(n) retransmit pkt n and all higher seq pkts in window
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
18
Go Back N
Transport Layer 35
ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs
need only remember expectedseqnum
out-of-order pkt discard (don‟t buffer) -gt no receiver buffering
Re-ACK pkt with highest in-order seq
Receiver
Transport Layer 3-36
GBN inaction
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
19
Transport Control Protocol
Enhanced GBN protocol
Segment structure
reliable data transfer and data transfer issues
flow control
connection management TCP congestion control
Transport Layer 37
TCP Overview RFCs 793 1122 1323 2018 2581
full duplex data bi-directional data flow
in same connection
MSS maximum segment size
connection-oriented handshaking (exchange
of control msgs) init‟s sender receiver state before data exchange
flow controlled sender will not
overwhelm receiver
point-to-point one sender one receiver
reliable in-order byte steam no ldquomessage boundariesrdquo
pipelined TCP congestion and flow
control set window size
send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
Transport Layer 38
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
20
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
Transport Layer 39
TCP reliable data transfer
TCP creates reliable service on top of IP‟s unreliable service
Pipelined segments
Cumulative acks
TCP uses single retransmission timer
The sequence number for a segment is the first byte-stream of the first byte in the segment
Transport Layer 40
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
21
TCP sender eventsdata rcvd from app
Create segment with seq
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer
Ack rcvd
If acknowledges previously unacked segments update what is known to
be acked
start timer if there are outstanding segments
Transport Layer 41
TCP seq ‟s and ACKsSeq ‟s
byte stream ldquonumberrdquo of first byte in segment‟s data
ACKs
seq of next byte expected from other side
cumulative ACK
Q how receiver handles out-of-order segments
A TCP spec doesn‟t say - up to implementor
Host A Host B
Usertypes
bdquoC‟
host receivesACK
host ACKsreceipt of
bdquoC‟
time
Transport Layer 42
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
22
TCP retransmission scenarios
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t
Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
timeS
eq=
92
tim
eou
t
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
Transport Layer 43
TCP retransmission scenarios (more)
Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase= 120
Transport Layer 44
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
23
Interactive data flow Overhead for each
packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟
If the receiver waits a while it can piggyback the data packet
Delayed ack Wait up to 200ms for next segment If no next segment send ACK
Should sender use delayed acks too[Stevens figure 193]
Host A Host B
Usertypes
bdquoC‟
host ACKsreceipt
of echoedbdquoC‟
host ACKsreceipt of
bdquoC‟
timesimple telnet scenario
Transport Layer 45
Host echoesback bdquoC‟
Nagle Algorithm [RFC 896]
Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data
LANs usually not congested so it might be okay
Small packets termed tinygrams over congested WAN ndash bad news
New data can‟t be sent until outstanding data is acked
Small amounts of data are collected and sent in a single segment when ackarrives
Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent
[Stevens 194]
Transport Layer 46
Nagle‟s alg
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
24
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 200ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
Transport Layer 47
Fast Retransmit
Time-out period often relatively long long delay before
resending lost packet
Detect lost segments via duplicate ACKs Sender often sends
many segments back-to-back
If segment is lost there will likely be many duplicate ACKs
If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend
segment before timer expires
Transport Layer 48
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
25
Host A
tim
eou
t
Host B
time
X
Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 50
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
26
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
longer than RTT but RTT varies
too short premature timeout
unnecessary retransmissions
too long slow reaction to segment loss
Q how to estimate RTT SampleRTT measured time from
segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary want estimated RTT ldquosmootherrdquo
average several recent measurements not just current SampleRTT
Transport Layer 51
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value = 0125
Transport Layer 52
[Retransmission example in Stevens 211]
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
27
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
Transport Layer 53
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Transport Layer 54
Discuss [Stevens
212 214 216
217]
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
28
TCP Flow Control
receive side of TCP connection has a receive buffer
speed-matching service matching the send rate to the receiving app‟s drain rate
app process may be slow at reading from buffer
sender won‟t overflowreceiver‟s buffer by
transmitting too muchtoo fast
flow control
Transport Layer 55
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKeddata to RcvWindow guarantees receive
buffer doesn‟t overflow
Discuss [Stevens
201 204 205
206]
Transport Layer 56
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
29
Delayed Duplicates Problem
A user asks for a connection
Due to congestion the packet is caught in a traffic jam
The user asks again for the connection
Destination accepts 2nd connection request
User sends info to dest
Info gets caught in a traffic jam
User sends info again
Dest receives the info
Connection is closed by both parties
The original connection request and user info find their way to the destination
Transport Layer 57
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
initialize TCP variables
seq s
buffers flow control info (eg RcvWindow)
As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice
Possible solutions A transport address can be used only
once (impossible for client-server model)
Each connection is given an identifier by
the originator (requires indefinite history to
be saved by the transport layer)
TCP‟s Solution
Three way handshake
Step 1 client host sends TCP SYN segment to server
specifies initial seq
no data
Step 2 server host receives SYN replies with SYNACK segment
server allocates buffers
specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 58
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
30
TCP Connection Management (cont)
Connection Establishment
Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)
Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN
Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)
client server
Transport Layer 59
TCP Connection Management (cont)
Closing a connection
client closes socketclientSocketclose()
Step 1 client end system sends TCP FIN control
segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client server
close
close
closed
tim
ed w
ait
Transport Layer 60
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
31
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters ldquotimed waitrdquo -will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client server
closing
closing
closedti
med w
ait
closed
Transport Layer 61
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Transport Layer 62
[Tanenbaum 633]
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
32
Principles of Congestion Control
Congestion informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
different from flow control
manifestations
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem
Transport Layer 63
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS every RTT until loss detected
multiplicative decrease cut CongWin in half after loss
timecongestio
n w
indow
siz
e
Saw toothbehavior probing
for bandwidth
Transport Layer 64
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
33
TCP Congestion Control details
sender limits transmissionLastByteSent-LastByteAcked
min(CongWinRcvWindow)
Let‟s assume that RcvWindowis not a constraint (for simplicity)
Roughly
CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or3 duplicate acks
TCP sender reduces rate (CongWin) after loss event
three mechanisms AIMD
slow start
conservative after timeout events
rate =CongWin
RTTBytessec
Transport Layer 65
TCP Slow Start
When connection begins CongWin = 1 MSS Example MSS = 500
bytes amp RTT = 200 msec
initial rate = 20 kbps
available bandwidth may be gtgt MSSRTT desirable to quickly ramp
up to respectable rate
When connection begins increase rate exponentially fast until first loss event
Transport Layer 66
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
34
TCP Slow Start (more)
When connection begins increase rate exponentially until first loss event double CongWin every
RTT
done by incrementing CongWin for every ACK received
Summary initial rate is slow but ramps up exponentially fast
Host A
RT
T
Host B
time
Transport Layer 67
Refinement inferring loss
After 3 dup ACKs
CongWin is cut in half
window then grows linearly
Part of Fast Recovery
But after timeout event
CongWin instead set to 1 MSS
window then grows exponentially (This is SS)
to a threshold (ssthresh) then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer 68
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
35
Refinement
Q When should the exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
Implementation Variable Threshold (ssthresh)
At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 69
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 70
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
36
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to ldquoCongestion
Avoidancerdquo
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting
in increase of CongWin by
1 MSS every RTT
SS or CA Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin2
CongWin = Threshold
Set state to ldquoCongestion
Avoidancerdquo
Fast recovery
implementing multiplicative
decrease CongWin will not
drop below 1 MSS
SS or CA Timeout Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 71
TCP throughput
What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start
Let W be the window size when loss occurs
When window is W throughput is WRTT
Just after loss window drops to W2 throughput to W2RTT
Average throughout 75 WRTT
Transport Layer 72
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
37
TCP Futures TCP over ldquolong fat pipesrdquo
Example 1500 byte segments 100ms RTT want 10 Gbps throughput
Requires window size W = 83333 in-flight segments
Throughput in terms of loss rate
L = 210-10 Wow New versions of TCP for high-speed
LRTT
MSS221
Transport Layer 73
Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
Transport Layer 74
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
38
Why is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer 75
Fairness (more)
Fairness and UDP Multimedia apps often
do not use TCP do not want rate
throttled by congestion control
Instead use UDP pump audiovideo at
constant rate tolerate packet loss
Research area TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts
Web browsers do this Example link of rate R
supporting 9 connections new app asks for 1 TCP gets
rate R10 new app asks for 11 TCPs
gets R2
Transport Layer 76
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
39
Chapter 3 Summary
principles behind transport layer services
multiplexing demultiplexing
reliable data transfer
flow control
congestion control
instantiation and implementation in the Internet
UDP
TCP
After that
leaving the network ldquoedgerdquo (application transport layers)
into the network ldquocorerdquo
Transport Layer 77
Next
Socket programming over TCP
Connection-oriented demux
TCP socket identified by 4-tuple source IP address
source port number
dest IP address
dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets each socket identified by
its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket for each request
Transport Layer 78
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80
40
Connection-oriented demux (cont)
ClientIPB
P1
clientIP A
P1P2P4
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P5 P6 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 79
Connection-oriented demux Threaded Web Server
ClientIPB
P1
clientIP A
P1P2
serverIP C
SP 9157
DP 80
SP 9157
DP 80
P4 P3
D-IPC
S-IP A
D-IPC
S-IP B
SP 5775
DP 80
D-IPC
S-IP B
Transport Layer 80