cs 4284 systems capstone

71
CS 4284 Systems Capstone Godmar Back Networking

Upload: rico

Post on 24-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

CS 4284 Systems Capstone. Networking. Godmar Back. TCP. Study of TCP: Outline. segment structure reliable data transfer flow control connection management [ Network Address Translation ] [ Principles of congestion control ] TCP congestion control. full duplex data: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 4284 Systems Capstone

CS 4284Systems Capstone

Godmar Back

Networking

TCP

CS 4284 Spring 2013

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management

[ Network Address Translation ][ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T CPsend buffer

TC Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

CS 4284 Spring 2013

TCP Segment Structuresource port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberreceive windowurg data pnterchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection

establishment(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 2: CS 4284 Systems Capstone

TCP

CS 4284 Spring 2013

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management

[ Network Address Translation ][ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T CPsend buffer

TC Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

CS 4284 Spring 2013

TCP Segment Structuresource port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberreceive windowurg data pnterchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection

establishment(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 3: CS 4284 Systems Capstone

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management

[ Network Address Translation ][ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T CPsend buffer

TC Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

CS 4284 Spring 2013

TCP Segment Structuresource port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberreceive windowurg data pnterchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection

establishment(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 4: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow in

same connectionndash MSS maximum segment

sizebull connection-oriented

ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not overwhelm

receiver

bull point-to-pointndash one sender one receiver

bull reliable in-order byte streamndash no ldquomessage boundariesrdquo

bull pipelinedndash TCP congestion and flow

control set window sizebull send amp receive buffers

socketdoor

T CPsend buffer

TC Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

CS 4284 Spring 2013

TCP Segment Structuresource port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberreceive windowurg data pnterchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection

establishment(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 5: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Segment Structuresource port dest port

32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberreceive windowurg data pnterchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection

establishment(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 6: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segments

bull Cumulative acksbull TCP uses single

retransmission timer

bull Retransmissions are triggered byndash timeout eventsndash duplicate acks

bull Initially consider simplified TCP senderndash ignore duplicate

acksndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 7: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next byte

expected from other side

ndash cumulative ACKQ how receiver handles

out-of-order segmentsndash A TCP spec doesnrsquot

say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 8: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Sender Eventsdata rcvd from appbull create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeoutbull retransmit segment that

caused timeoutbull restart timer ack rcvdbull If acknowledges

previously unacked segmentsndash update what is known to

be ackedndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 9: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP sender

(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so

SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 10: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP retransmission scenariosHost A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

SendBase= 100

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92 ti

meo

ut

ACK=120

Seq=

92 ti

meo

ut

SendBase= 120

SendBase= 120

Sendbase= 100

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 11: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

losstimeo

ut

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 12: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 13: CS 4284 Systems Capstone

CS 4284 Spring 2013

Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in

small chunks)ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash

then send all at oncebull You can turn this off via

ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 14: CS 4284 Systems Capstone

CS 4284 Spring 2013

Delayed ACK vs Naglebull These two mechanisms have nothing to do with

each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable

delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender

ndash Delays sending actual data (so that more data can be fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 15: CS 4284 Systems Capstone

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 16: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull longer than RTTndash but RTT varies

bull too short premature timeoutndash unnecessary

retransmissions

bull too long slow reaction to segment loss

Q how to estimate RTT

bull SampleRTT measured time from segment transmission until ACK receipt

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several

recent measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 17: CS 4284 Systems Capstone

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 18: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving averagebull influence of past sample decreases

exponentially fastbull typical value = 0125

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 19: CS 4284 Systems Capstone

CS 4284 Spring 2013

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 20: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates

from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 21: CS 4284 Systems Capstone

CS 4284 Spring 2013

RTT Measurement amp Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original

segment or early for retransmitted segmentbull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 22: CS 4284 Systems Capstone

CS 4284 Spring 2013

Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted

segmentsndash Ok but what if current timeout value is too small for network

delayndash In this case would keep timing out but couldnrsquot adjust timeout

according to formulabull Solution If measurement canrsquot be made use exponential

backoff until new measurement can be madendash By factor of 2 with limit

bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 23: CS 4284 Systems Capstone

CS 4284 Spring 2013

Fast Retransmitbull Time-out period often relatively long

ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs

ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs

bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 24: CS 4284 Systems Capstone

CS 4284 Spring 2013

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 25: CS 4284 Systems Capstone

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 26: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Flow Controlbull receive side of

TCP connection has a receive buffer

bull speed-matching service matching the send rate to the receiving apprsquos drain rate

bull app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 27: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards out-of-order segments)

bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindowndash guarantees

receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 28: CS 4284 Systems Capstone

CS 4284 Spring 2013

Flow Control Persistence Timerbull Suppose sender is blocked cause receiver

application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size

RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck

ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 29: CS 4284 Systems Capstone

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 30: CS 4284 Systems Capstone

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 31: CS 4284 Systems Capstone

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 32: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control info

(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr

hellip)

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 33: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 34: CS 4284 Systems Capstone

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 35: CS 4284 Systems Capstone

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbersbull Must also guard against sequence number prediction attack

ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 36: CS 4284 Systems Capstone

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence number a host B is going to use next

bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions

based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 37: CS 4284 Systems Capstone

CS 4284 Spring 2013

When SYNs Attackbull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses

bull Solutionndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not allocate buffers

ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 38: CS 4284 Systems Capstone

Sequence Number Summarybull Goals Set 1

ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can

eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial

sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 39: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Connection Management (cont)Closing a connection

client closes socket close(s)

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 40: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 41: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP ConnectionFSM

The heavy solid line is the normal path for a client

The heavy dashed line is the normal path for a server

The light lines are unusual events

Each transition is labeled by the event causing it and the action resulting from it separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 42: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 43: CS 4284 Systems Capstone

CS 4284 Spring 2013

Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if

packets (FIN ACK) can be lostndash No Famous two-army problem

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 44: CS 4284 Systems Capstone

CS 4284 Spring 2013

Summarybull TCP segments acknowledgements amp

retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management

schemendash SYN attackndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 45: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Miscellaneousbull MSS Maximum Segment Size Option

ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to

allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT

measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 46: CS 4284 Systems Capstone

CS 4284 Spring 2013

Study of TCP Outlinebull segment structurebull reliable data transfer

ndash delayed ACKsndash Naglersquos algorithm

bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 47: CS 4284 Systems Capstone

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 48: CS 4284 Systems Capstone

CS 4284 Spring 2013

Principles of Congestion Control

Congestionbull informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull different from flow controlbull manifestations

ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 49: CS 4284 Systems Capstone

CS 4284 Spring 2013

Causescosts of congestion scenario 1 bull two senders two

receiversbull one router infinite

buffers bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared output link buffers

Host Alin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 50: CS 4284 Systems Capstone

CS 4284 Spring 2013

Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet

after timeout

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 51: CS 4284 Systems Capstone

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2lin

l out

a c

R2

R2lin

l out

R4

R2

R2lin

l out

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 52: CS 4284 Systems Capstone

CS 4284 Spring 2013

Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit

Q what happens as lin and lrsquoin increase

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 53: CS 4284 Systems Capstone

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission

capacity used for that packet was wastedbull ultimately leads to congestion collapse

Host A

Host B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 54: CS 4284 Systems Capstone

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even if no packet loss occurs

bull If packet loss occurs needed retransmission require offered load to be greater than goodput

bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 55: CS 4284 Systems Capstone

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion control

bull no explicit feedback from network

bull congestion inferred from end-system observed loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systemsndash single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 56: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Congestion Controlbull end-end control (no network

assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput

bull CongWin is dynamic function of perceived network congestion

How does sender notice congestion

bull loss event = timeout or 3 duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is primary cause of loss)

Three mechanismsbull AIMDbull slow startbull fast recovery

rate = CongWin

RTT Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 57: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 58: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Slow Startbull When connection

begins CongWin = 1 MSSndash Example MSS =

500 bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly

ramp up to respectable rate

bull When connection begins increase rate exponentially fast until first loss event

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 59: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Slow Start (more)bull When connection begins

increase rate exponentially until first loss eventndash double CongWin every

RTTndash done by incrementing CongWin for every ACK received

bull Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 60: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Tahoe vs RenoQ When should the

exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss

eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly

(this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 61: CS 4284 Systems Capstone

CS 4284 Spring 2013

Timeouts vs 3-dup ACKsbull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout eventndash CongWin instead set

to 1 MSS ndash window then grows

exponentiallyndash to a threshold then

grows linearly

bull 3 dup ACKs indicates network capable of delivering some segments

bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 62: CS 4284 Systems Capstone

CS 4284 Spring 2013

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 63: CS 4284 Systems Capstone

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 64: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Sender Congestion ControlEvent State TCP Sender Action Commentary

ACK receipt for previously unacked data

Slow Start (SS)

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA

Resulting in a doubling of CongWin every RTT

ACK receipt for previously unacked data

CongestionAvoidance (CA)

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

Triple duplicate ACK

SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 65: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput

to W2RTT bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 66: CS 4284 Systems Capstone

CS 4284 Spring 2013

TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT

want 10 Gbps throughputbull Requires window size W = 83333 in-flight

segmentsbull Throughput in terms of loss rate

bull L = 210-10 Very lowbull Require almost perfect link

LRTTMSS221

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 67: CS 4284 Systems Capstone

Equation-Based Control

bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)bull Instead equation-based control uses an

equation to compute sending rate based on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 68: CS 4284 Systems Capstone

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 69: CS 4284 Systems Capstone

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 70: CS 4284 Systems Capstone

CS 4284 Spring 2013

Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)
Page 71: CS 4284 Systems Capstone

CS 4284 Spring 2013

Fairness (more)Fairness and UDPbull Multimedia apps often

do not use TCPndash do not want rate throttled

by congestion controlbull Instead use UDP

ndash pump audiovideo at constant rate tolerate packet loss

bull TCP friendliness

Fairness and parallel TCP connections

bull nothing prevents app from opening parallel connections between 2 hosts

bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets

rate R10ndash new app asks for 9 TCPs gets

R2bull Thatrsquos what ldquodownload

acceleratorsrdquo do

  • CS 4284 Systems Capstone
  • TCP
  • Study of TCP Outline
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Reliable Data Transfer
  • TCP Seq rsquos and ACKs
  • TCP Sender Events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Naglersquos algorithm
  • Delayed ACK vs Nagle
  • TCP (2)
  • TCP Round Trip Time and Timeout
  • RTT Distributions
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • RTT Measurement amp Retransmissions
  • Karnrsquos Algorithm
  • Fast Retransmit
  • Fast retransmit algorithm
  • TCP (3)
  • TCP Flow Control
  • TCP Flow Control (contrsquod)
  • Flow Control Persistence Timer
  • Flow Control Silly Window Syndrome
  • Study of TCP Outline (2)
  • TCP (4)
  • TCP Connection Management
  • TCP 3-way handshake
  • 3-way Handshake amp Delayed Dups
  • Sequence Number Reuse
  • When Sequence Numbers Attack
  • When SYNs Attack
  • Sequence Number Summary
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection FSM
  • TCP Connection Management (contrsquod)
  • Closing a Connection
  • Summary
  • TCP Miscellaneous
  • Study of TCP Outline (3)
  • TCP (5)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Reasons for Congestion Control
  • Congestion Control Approaches
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • TCP Slow Start (more)
  • TCP Tahoe vs Reno
  • Timeouts vs 3-dup ACKs
  • Summary TCP Congestion Control
  • Summary TCP Congestion Control (2)
  • TCP Sender Congestion Control
  • TCP Throughput Idealized
  • TCP Throughput amp Loss
  • Equation-Based Control
  • TCP Fairness
  • TCP Fairness (2)
  • Why is TCP fair
  • Fairness (more)