3rd edition: chapter 3hbinsky/intro comp comm/3_transport_layer.pdf · passes to network layer rcv...

40
1 Transport Layer By Ossi Mokryn and Hadar Binsky, Based on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross, also by Jennifer Rexford, Princeton, And on data from beej‟s guide : http://beej.us/guide/bgnet Transport Layer Connectionless and connection oriented communication Sockets programming UDP TCP Reliable communication Flow control Congestion control Timers 2 Transport Layer

Upload: others

Post on 24-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

1

Transport Layer

By Ossi Mokryn and Hadar Binsky Based on slides fromthe Computer Networking

A Top Down Approach Featuring the Internet by Kurose and Ross also by Jennifer Rexford Princeton

And on data from beej‟s guide httpbeejusguidebgnet

Transport Layer

Connectionless and connection oriented communication

Sockets programming

UDP

TCP Reliable communication

Flow control

Congestion control

Timers

2Transport Layer

2

Transport services and protocols

provide logical communicationbetween app processes running on different hosts

transport protocols run in end systems

send side breaks app messages into segments passes to network layer

rcv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps

Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

Transport Layer 3

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control

flow control

connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP

services not available delay guarantees

bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

Transport Layer 4

3

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances

network layer services

Household analogy

12 kids sending letters to 12 kids

processes = kids

app messages = letters in envelopes

hosts = houses

transport protocol = Ann and Bill

network-layer protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

4

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing

Create sockets with port numbers

UDP socket identified by two-tuple

(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment

directs UDP segment to socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

5

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

clientIP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be

lost

delivered out of order to app

connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header

no congestion control UDP can blast away as fast as desired

Transport Layer 10

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 2: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

2

Transport services and protocols

provide logical communicationbetween app processes running on different hosts

transport protocols run in end systems

send side breaks app messages into segments passes to network layer

rcv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps

Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

Transport Layer 3

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control

flow control

connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP

services not available delay guarantees

bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

Transport Layer 4

3

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances

network layer services

Household analogy

12 kids sending letters to 12 kids

processes = kids

app messages = letters in envelopes

hosts = houses

transport protocol = Ann and Bill

network-layer protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

4

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing

Create sockets with port numbers

UDP socket identified by two-tuple

(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment

directs UDP segment to socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

5

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

clientIP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be

lost

delivered out of order to app

connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header

no congestion control UDP can blast away as fast as desired

Transport Layer 10

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 3: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

3

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances

network layer services

Household analogy

12 kids sending letters to 12 kids

processes = kids

app messages = letters in envelopes

hosts = houses

transport protocol = Ann and Bill

network-layer protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

4

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing

Create sockets with port numbers

UDP socket identified by two-tuple

(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment

directs UDP segment to socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

5

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

clientIP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be

lost

delivered out of order to app

connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header

no congestion control UDP can blast away as fast as desired

Transport Layer 10

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 4: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

4

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing

Create sockets with port numbers

UDP socket identified by two-tuple

(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment

directs UDP segment to socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

5

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

clientIP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be

lost

delivered out of order to app

connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header

no congestion control UDP can blast away as fast as desired

Transport Layer 10

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 5: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

5

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

clientIP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be

lost

delivered out of order to app

connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header

no congestion control UDP can blast away as fast as desired

Transport Layer 10

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 6: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

6

UDP more

often used for streaming multimedia apps

loss tolerant

rate sensitive

other UDP uses DNS

SNMP

reliable transfer over UDP add reliability at application layer

application-specific error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1‟s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment

check if computed checksum equals checksum field value

NO - error detected

YES - no error detected But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 7: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

7

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 8: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

8

Transmission Control Protocol

Principles of reliable communication

TCP basic notations 3 way handshake

TCP flow control congestion control

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 9: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

9

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer

important in app transport link layers

top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 10: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

10

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Note The state machine is for a certain host

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 11: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

11

Reliable Communication

First Model sender sends receiver receives

Is this enough

When will it work

When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2Sender can receive acknowledge packetsDiscussion

bull What does the receiver dobull What happens if data is available at state 2

Wait

for dataWait

for ack

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 12: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

12

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesn‟t receive ack within timeout

new mechanisms in error detection

receiver feedback control msg (ACK) rcvr-gtsender

sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send Data

Data available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1Sender waits

for data

Waitfor data

Waitfor ack packet or time out

state2b

timeout

Send

Dat

a

Discussion

bull What does the sender need to do for the retransmission

Process ack

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 13: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

13

This version has a fatal flaw

What happens if ACK corruptedlost

sender doesn‟t know what happened at receiver

can‟t just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACK garbled or didn‟t arrive

sender adds sequence number to each pkt

receiver discards (doesn‟t deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender

seq added to pkt

two seq ‟s (01) will suffice Why

must check if received ACK corrupted

twice as many states state must ldquorememberrdquo

whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver

must check if received packet is duplicate state indicates whether 0 or

1 is expected pkt seq

receiver sends ACK for last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 14: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

14

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 15: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

15

Performance of stop amp wait

Stop amp wait works but performance stinks

ex 1 Gbps link 15 ms prop delay 8000 bit packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

U sender

= 008

30008 = 000027

microsec

onds

L R

RTT + L R =

Transport Layer 30

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 16: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

16

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased

buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arrives

last packet bit arrives send ACK

ACK arrives send next

packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microsecon

ds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 17: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

17

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesn‟t ack packet if there‟s a gap

Sender has timer for oldest unacked packet If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header

ldquowindowrdquo of up to N consecutive unack‟ed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo

may receive duplicate ACKs (see receiver)

timer for each in-flight pkt

timeout(n) retransmit pkt n and all higher seq pkts in window

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 18: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

18

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt discard (don‟t buffer) -gt no receiver buffering

Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 19: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

19

Transport Control Protocol

Enhanced GBN protocol

Segment structure

reliable data transfer and data transfer issues

flow control

connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection

MSS maximum segment size

connection-oriented handshaking (exchange

of control msgs) init‟s sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundariesrdquo

pipelined TCP congestion and flow

control set window size

send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Transport Layer 38

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 20: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

20

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

Receive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IP‟s unreliable service

Pipelined segments

Cumulative acks

TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 21: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

21

TCP sender eventsdata rcvd from app

Create segment with seq

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer

Ack rcvd

If acknowledges previously unacked segments update what is known to

be acked

start timer if there are outstanding segments

Transport Layer 41

TCP seq ‟s and ACKsSeq ‟s

byte stream ldquonumberrdquo of first byte in segment‟s data

ACKs

seq of next byte expected from other side

cumulative ACK

Q how receiver handles out-of-order segments

A TCP spec doesn‟t say - up to implementor

Host A Host B

Usertypes

bdquoC‟

host receivesACK

host ACKsreceipt of

bdquoC‟

time

Transport Layer 42

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 22: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

22

TCP retransmission scenarios

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t

Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

timeS

eq=

92

tim

eou

t

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase= 120

Transport Layer 44

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 23: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

23

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving bdquoC‟

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 200ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Usertypes

bdquoC‟

host ACKsreceipt

of echoedbdquoC‟

host ACKsreceipt of

bdquoC‟

timesimple telnet scenario

Transport Layer 45

Host echoesback bdquoC‟

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data can‟t be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ackarrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Nagle‟s alg

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 24: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

24

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 200ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before

resending lost packet

Detect lost segments via duplicate ACKs Sender often sends

many segments back-to-back

If segment is lost there will likely be many duplicate ACKs

If sender receives 3 dup ACKs for the same data it supposes that segment after ACKeddata was lost fast retransmit resend

segment before timer expires

Transport Layer 48

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 25: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

25

Host A

tim

eou

t

Host B

time

X

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 26: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

26

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT but RTT varies

too short premature timeout

unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo

average several recent measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 27: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

27

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT +

|SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

Discuss [Stevens

212 214 216

217]

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 28: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

28

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving app‟s drain rate

app process may be slow at reading from buffer

sender won‟t overflowreceiver‟s buffer by

transmitting too muchtoo fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKeddata to RcvWindow guarantees receive

buffer doesn‟t overflow

Discuss [Stevens

201 204 205

206]

Transport Layer 56

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 29: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

29

Delayed Duplicates Problem

A user asks for a connection

Due to congestion the packet is caught in a traffic jam

The user asks again for the connection

Destination accepts 2nd connection request

User sends info to dest

Info gets caught in a traffic jam

User sends info again

Dest receives the info

Connection is closed by both parties

The original connection request and user info find their way to the destination

Transport Layer 57

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables

seq s

buffers flow control info (eg RcvWindow)

As seen in the previous slide due to the delayed duplicates problem a simple 2-way handshake is not suffice

Possible solutions A transport address can be used only

once (impossible for client-server model)

Each connection is given an identifier by

the originator (requires indefinite history to

be saved by the transport layer)

TCP‟s Solution

Three way handshake

Step 1 client host sends TCP SYN segment to server

specifies initial seq

no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers

specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 58

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 30: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

30

TCP Connection Management (cont)

Connection Establishment

Step 1 client sends SYN segment to server with its ISN (Initial Sequence Number)

Step 2 server receives SYN replies with SYN+ACK (client‟s ISN+1) and its own ISN

Step 3 client receives SYN+ACK replies with ACK (server‟s ISN+1)

client server

Transport Layer 59

TCP Connection Management (cont)

Closing a connection

client closes socketclientSocketclose()

Step 1 client end system sends TCP FIN control

segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client server

close

close

closed

tim

ed w

ait

Transport Layer 60

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 31: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

31

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo -will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client server

closing

closing

closedti

med w

ait

closed

Transport Layer 61

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 62

[Tanenbaum 633]

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 32: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

32

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

different from flow control

manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Transport Layer 63

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS every RTT until loss detected

multiplicative decrease cut CongWin in half after loss

timecongestio

n w

indow

siz

e

Saw toothbehavior probing

for bandwidth

Transport Layer 64

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 33: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

33

TCP Congestion Control details

sender limits transmissionLastByteSent-LastByteAcked

min(CongWinRcvWindow)

Let‟s assume that RcvWindowis not a constraint (for simplicity)

Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD

slow start

conservative after timeout events

rate =CongWin

RTTBytessec

Transport Layer 65

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps

available bandwidth may be gtgt MSSRTT desirable to quickly ramp

up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 66

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 34: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

34

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every

RTT

done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

RT

T

Host B

time

Transport Layer 67

Refinement inferring loss

After 3 dup ACKs

CongWin is cut in half

window then grows linearly

Part of Fast Recovery

But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially (This is SS)

to a threshold (ssthresh) then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 68

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 35: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

35

Refinement

Q When should the exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

Implementation Variable Threshold (ssthresh)

At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 69

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 70

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 36: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

36

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start

(SS)

ACK receipt

for previously

unacked

data

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to ldquoCongestion

Avoidancerdquo

Resulting in a doubling of

CongWin every RTT

Congestion

Avoidance

(CA)

ACK receipt

for previously

unacked

data

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting

in increase of CongWin by

1 MSS every RTT

SS or CA Loss event

detected by

triple

duplicate

ACK

Threshold = CongWin2

CongWin = Threshold

Set state to ldquoCongestion

Avoidancerdquo

Fast recovery

implementing multiplicative

decrease CongWin will not

drop below 1 MSS

SS or CA Timeout Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate

ACK

Increment duplicate ACK count

for segment being acked

CongWin and Threshold not

changed

Transport Layer 71

TCP throughput

What‟s the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT

Just after loss window drops to W2 throughput to W2RTT

Average throughout 75 WRTT

Transport Layer 72

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 37: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

37

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 73

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 74

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 38: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

38

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases

multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 75

Fairness (more)

Fairness and UDP Multimedia apps often

do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 76

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 39: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

39

Chapter 3 Summary

principles behind transport layer services

multiplexing demultiplexing

reliable data transfer

flow control

congestion control

instantiation and implementation in the Internet

UDP

TCP

After that

leaving the network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 77

Next

Socket programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address

source port number

dest IP address

dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified by

its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 78

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80

Page 40: 3rd Edition: Chapter 3hbinsky/intro comp comm/3_Transport_Layer.pdf · passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport

40

Connection-oriented demux (cont)

ClientIPB

P1

clientIP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 79

Connection-oriented demux Threaded Web Server

ClientIPB

P1

clientIP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPC

S-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPC

S-IP B

Transport Layer 80