lecture24-congestion-control · the many failings of tcp congestion control 1. fills up queues...
TRANSCRIPT
ComputerNetworks:ArchitectureandProtocols
CS4450
Lecture22MoreTCPConges1onControl
RachitAgarwal
Recap:TransmissionControlProtocol(TCP)
• Reliable,in-orderdelivery• Ensuresbytestream(eventually)arrivesintact
• Inthepresenceofcorruption,delays,reordering,loss
• Connectionoriented• Explicitset-upandtear-downofTCPsession
• Fullduplexstreamofbyteservice
• Sendsandreceivesstreamofbytes
• Flowcontrol• Ensuresthesenderdoesnotoverwhelmthereceiver
• Congestioncontrol• Dynamicadaptationtonetworkpath’scapacity
Recap:BasicComponentsofTCP
• Segments,Sequencenumbers,ACKs
• TCPusesbytesequencenumberstoidentifypayloads
• ACKsreferredtosequencenumbers
• Windowsizesexpressedintermsof#ofbytes
• Retransmissions
• Can’tbecorrectwithoutretransmittinglost/corrupteddata
• TCPretransmitsbasedontimeoutsandduplicateACKs
• TimeoutsbasedonestimateofRTT
• FlowControl
• CongestionControl
Recap:LosswithCumulativeACKs
• Sendersendspacketswith100Bandseqnos• 100,200,300,400,500,600,700,800,900
• Assume5thpacket(seqno500)islost,butnoothers
• StreamofACKswillbe
• 200,300,400,500,500,500,500,500
• DuplicateACKsareasignofanisolatedloss• ThelackofACKprogressmeans500hasn’tbeendelivered
• StreamofACKsmeanssomepacketsarebeingdelivered
• Therefore,couldtriggerresenduponreceivingkduplicateACKs• Largek->fewerretransmissions,morelatency,lowerrate(why?)
• Smallk->moreretransmissions,lowerlatency
RTT
Recap:SettingtheTimeoutValue(RTO)
1
1
Timeout
Timeout too long -> inefficient
RTT
1
1Timeout
Timeout too short -> duplicate packets
Recap:FlowControl(SlidingWindow)
• AdvertisedWindow:W
• CansendWbytesbeyondthenextexpectedbyte
• ReceiverusesWtopreventsenderfromoverflowingbuffer
• Limitsnumberofbytessendercanhaveinflight
Recap:TCPcongestioncontrol:high-levelidea
• Endhostsadjustsendingrate
• Basedonimplicitfeedbackfromthenetwork
• Implicit:routerdropspacketsbecauseitsbufferoverflows,notbecauseit’stryingtosendmessage
• Hostsprobenetworktotestlevelofcongestion• Speedupwhennocongestion(i.e.,nopacketdrops)• Slowdownwhenwhencongestion(i.e.,packetdrops)
• Howtodothisefficiently?• ExtendTCP’sexistingwindow-basedprotocol…• Adaptthewindowsizebasedinresponsetocongestion
Recap:NotAllLossestheSame
• DuplicateACKs:isolatedloss• StillgettingACKs
• Timeout:possibledisaster
• NotenoughduplicateACKs• Musthavesufferedseverallosses
Recap:AdditiveIncrease,MultiplicativeDecrease(AIMD)
• Additiveincrease• Onsuccessoflastwindowofdata,increasebyoneMSS
• IfWpacketsinarowhavebeenACKed,increaseWbyone
• i.e.,+1/WperACK
• Multiplicativedecrease
• OnlossofpacketsbyDupACKs,dividecongestionwindowbyhalf• Specialcase:whentimeout,reducecongestionwindowtooneMSS
Recap:AIMD
• ACK:CWND->CWND+1/CWND
• WhenCWNDismeasuredinMSS
• Note:afterafullwindow,CWNDincreaseby1MSS
• Thus,CWNDincreasesby1MSSperRTT
• 3rdDupACK:CWND->CWND/2
• Specialcaseoftimeout:CWND->1MSS
Recap:AIMDStartsTooSlowlyWindow
tIt could take a long time to get started!
Need to start with a small CWND to avoid overloading the network
Recap:“SlowStart”Phase
• Startwithasmallcongestionwindow
• Initially,CWNDis1MSS
• So,initialsendingrateisMSS/RTT
• Thatcouldbeprettywasteful• Mightbemuchlessthantheactualbandwidth
• Linearincreasetakesalongtimetoaccelerate
• Slow-startphase(actually“faststart”)• Senderstartsataslowrate(hencethename)
• …butincreasesexponentiallyuntilfirstloss
Recap:SlowStartandtheTCPSawtooth(notimeouts)Window
tExponential “slow start”
Timeouts
LossDetectedbyTimeout
• SenderstartsatimerthatrunsforRTOseconds
• RestarttimerwheneverACKfornewdataarrives
• Iftimerexpires
• SetSSHTHRESH<-CWND/2(“SlowStartThreshold”)
• SetCWND<-1(MSS)
• Retransmitfirstlostpacket
• ExecuteSlowStartuntilCWND>SSTHRESH
• AfterwhichswitchtoAdditiveIncrease
TCPTimeDiagram(withtimeouts)Window
tSlow start in operation until it
reached half of previous CWND, i.e., SSThresh
Fast Retransmission Timeout SSThresh Set to here
SummaryofIncrease
• “Slowstart”:increaseCWNDby1(MSS)foreachACK
• Afactorof2perRTT
• Leaveslow-startregimewheneither:
• CWND>SSTHRESH
• Packetdropdetectedbydupacks
• EnterAIMDregime
• Increaseby1(MSS)foreachwindow’sworthofACKeddata
SummaryofDecrease
• CutCWNDhalfonlossdetectedbydupacks
• Fastretransmittoavoidoverreacting
• CutCWNDallthewayto1(MSS)ontimeout
• SetssthreshtoCWND/2
• NeverdropCWNDbelow1(MSS)
• Ourcorrectnesscondition:alwaystrytomakeprogress
TCPCongestionControlDetails
Implementation
• Stateatsender• CWND(initializedtoasmallconstant)
• ssthresh(initializedtoalargeconstant)• dupACKcount• Timer,asbefore
• Eventsatsender• ACK(newdata)• dupACK(duplicateACKforolddata)• Timeout
• Whataboutreceiver?JustsendACKsuponarrival
• AssumingRWND>CWND
Event:ACK(newdata)
• Ifinslowstart• CWND+=1 CWND packets per RTT
Hence after one RTT with no drops:
CWND = 2 x CWND
Event:ACK(newdata)
• IfCWND<=ssthresh
• CWND+=1
• Else• CWND=CWND+1/CWND
CWND packets per RTTHence after one RTT with
no drops:CWND = CWND + 1
Slow Start Phase
Congestion Avoidance Phase(additive increase)
Event:Timeout
• OnTimeout
• ssthresh<-CWND/2
• CWND<-1
Event:dupACK
• dupACKcount++
• IfdupACKcount=3/*fastretransmit*/
• ssthresh<-CWND/2
• CWND<-CWND/2
Remains in congestion avoidance after fast
retransmission
TimeDiagramWindow
tSlow start in operation until it
reached half of previous CWND, i.e., SSThresh
Slow-start restart: Go back to CWND of 1 MSS, but take advantage of knowing the previous value of CWND.
Fast Retransmission Timeout SSThresh Set to here
TCPFlavors
• TCPTahoe• CWND=1ontripledupACK
• TCPReno• CWND=1ontimeout
• CWND=CWND/2ontripledupACK
• TCP-newReno• TCP-Reno+improvedfastrecovery
• TCP-SACK• Incorporatesselectiveacknowledgements
Our default assumption
AnyQuestions?
TCPandfairnessguarantees
ConsiderASimpleModel
• Flowsaskforanamountofbandwidthri• Inreality,thisrequestisimplicit(theamounttheysend)
• Thelinkgivesthemanamountai• Again,thisisimplicit(byhowmuchisforwarded)
• ai<=ri
• ThereissometotalcapacityC
• Sumai<=C
Fairness
• Whenallflowswantthesamerate,fairiseasy
• Fairshare=C/N• C=capacityoflink• N=numberofflows
• Note:• Thisisfairshareperlink.Thisisnotaglobalfairshare
• Whennotallflowshavethesamedemand?
• Whathappenshere?
Example1
• Requests:riAllocations:ai
• C=20• Requests:r1=6,r2=5,r3=4
• Solution• a1=6,a2=5,a3=4
• Whenbandwidthisplentiful,everyonegetstheirrequest
• Thisistheeasycase
Example2
• Requests:riAllocations:ai
• C=12• Requests:r1=6,r2=5,r3=4
• Onesolution• a1=4,a2=4,a3=4• Everyonegetsthesame
• Whynotproportionaltotheirdemands?
• ai=(12/15)ri
• Askingformoregetsyoumore!
• Notincentivecompatible(i.e.,cheatingworks!)
• Youcan’thavethatandinviteinnovation!
Example3
• Requests:riAllocations:ai
• C=14• Requests:r1=6,r2=5,r3=4
• a3=4(can’tgivemorethanaflowwants)
• Remainingbandwidthis10,withdemands6and5
• Frompreviousexample,ifbothwantmorethantheirshare,they
bothgethalf
• a1=a2=5
Max-MinFairness
• GivenasetofbandwidthdemandsriandtotalbandwidthC,max-min
bandwidthallocationsareai=min(f,ri)
• WherefistheuniquevaluesuchthatSum(ai)=Corsetftobe
infiniteifnosuchvalueexists
• Thisiswhatround-robinservicegives• IfallpacketsareMTU
• Property:• Ifyoudon'tgetfulldemand,noonegetsmorethanyou
ComputingMax-MinFairness
• Assumedemandsareinincreasingorder…
• IfC/N<=r1,thenai=C/Nforalli
• Else,a1=r1,setC=C-a1andN=N-1
• Repeat
• Intuition:allflowsrequestinglessthanfairsharegettheirrequest.Remainingflowsdivideequally
Example
• AssumelinkspeedCis10Mbps
• Havethreeflows:• Flow1issendingatarate8Mbps
• Flow2issendingatarate6Mbps
• Flow3issendingatarate2Mbps
• Howmuchbandwidthshouldeachget?
• Accordingtomax-minfairness?
• Workthisout,talktoyourneighbors
Example
• Requests:riAllocations:ai
• Requests:r1=8,r2=6,r3=2
• C=10,N=3,C/N=3.33• Canserveallforr3• Remover3fromtheaccounting:C=C-r3=8,N=2
• C/2=4• Can’tserviceallforr1orr2• Soholdthemtotheremainingfairshare:f=4
862
44
2
f = 4: min(8, 4) = 4min(6, 4) = 4min(2, 4) = 2
10
Max-MinFairness
• Max-minfairnessthenaturalper-linkfairness
• Onlyonethatis• Symmetric
• Incentivecompatible(askingformoredoesn’thelp)
RealityofCongestionControl
Conges1oncontrolisaresourcealloca1onprobleminvolvingmanyflows,manylinksandcomplicatedglobaldynamics
1 Gbps
600 Mbps2 Gbps
Classicalresult:
Inastablestate(nodynamics;allflowsareinfinitelylong;nofailures;etc.)
TCPguaranteesmax-minfairness
AnyQuestions?
TheManyFailingsofTCPCongestionControl
1. Fillsupqueues(largequeueingdelays)2. EverysegmentnotACKedisaloss(non-congestionrelatedlosses)
3. Producesirregularsaw-toothbehavior4. BiasedagainstlongRTTs(unfair)5. Notdesignedforshortflows6. Easytocheat
(1)TCPFillsUpQueues
• TCPonlyslowsdownwhenqueuesfillup• Highqueueingdelays
• Meansthatitisnotoptimizedforlatency
• Whatisitoptimizedforthen?
• Answer:Fairness(discussioninnextfewslides)
• Andmanypacketsaredroppedwhenbufferfills
• Alternative1:Usesmallbuffers
• Isthisagoodidea?• Answer:No,burstytrafficwillleadtoreducedutilization
• Alternative:RandomEarlyDrop(RED)
• Droppacketsonpurposebeforequeueisfull• Averycleveridea
RandomEarlyDrop(orDetection)
• MeasureaveragequeuesizeAwithexponentialweighting
• Average:Allowsforshortburstsofpacketswithoutover-reacting
• DropprobabilityisafunctionofA• NodropsifAisverysmall
• LowdroprateformoderateA’s
• DropeverythingifAistoobig
• Dropprobabilityappliedtoincomingpackets
• Intuition:linkisfullyutilizedwellbeforebufferisfull
AdvantagesofRED
• Keepsqueuessmaller,whileallowingbursts
• Justusingsmallbuffersinrouterscan’tdothelatter
• Reducessynchronizationbetweenflows• Notallflowsaredroppingpacketsatonce• Increases/decreasesaremoregentle
• Problem
• TurnsoutthatREDdoesnotguaranteefairness
(2)Non-Congestion-RelatedLosses?
• Forinstance,REDdropspacketsintentionally• TCPwouldthinkthenetworkiscongested
• CanuseExplicitCongestionNotification(ECN)
• BitinIPpacketheader(actuallytwo)• TCPreceiverreturnsthisbitinACK
• WhenREDrouterwoulddrop,itsetsbitinstead
• Congestionsemanticsofbitexactlylikethatofdrop
• Advantages• Doesn’tconfusecorruptionwithcongestion
(3)SawtoothBehaviorUneven
• TCPthroughputis“choppy"• RepeatedswingsbetweenW/2toW
• Someappswouldprefersendingatasteadyrate
• E.g.,streamingapps
• Asolution:“Equation-basedcongestioncontrol”• DitchTCP’sincrease/decreaserulesandjustfollowtheequation:• [MatthewMathis,1997]TCPThroughput=MSS/RTTsqrt(3/2p)
• Wherepisdroprate
• Measuredroppercentagepandsetrateaccordingly
• FollowingtheTCPequationensureswe’reTCPfriendly• I.e.,usenomorethanTCPdoesinsimilarsetting
AnyQuestions?
(4)BiasAgainstLongRTTs
• FlowsgetthroughputinverselyproportionaltoRTT• TCPunfairinthefaceofheterogeneousRTTs!• [MatthewMathis,1997]TCPThroughput=MSS/RTTsqrt(3/2p)
• Wherepisdroprate
• FlowswithlongRTTwillachievelowerthroughput
A1 B1
A2 B2
100 ms
200 ms
Bottleneck Link
PossibleSolutions
• MakeadditiveconstantproportionaltoRTT
• Butpeopledon’treallycareaboutthis…
(5)HowShortFlowsFare?
• Internettraffic:• Elephantandmiceflows
• Elephantflowscarrymostbytes(>95%),butareveryfew(<5%)
• Miceflowscarryveryfewbytes,butmostflowsaremice
• 50%offlowshave<1500Btosend(1MTU);
• 80%offlowshave<100KBtosend
• ProblemwithTCP?
• MiceflowsdonothaveenoughpacketsforduplicateACKs!!
• Drop~=~Timeout(unnecessaryhighlatency)
• Thesearepreciselytheflowsforwhichlatencymatters!!!
• Anotherproblem:
• Startingwithsmallwindowsizeleadstohighlatency
PossibleSolutions?
• Largerinitialwindow?• Googleproposedmovingfrom~4KBto~15KB
• Covers~90%ofHTTPWeb
• Decreasesdelayby5%
• Manyrecentresearchpapersonthetimeoutproblem
• Requirenetworksupport
(6)Cheating
• TCPwasdesignedassumingacooperativeworld
• Noattemptwasmadetopreventcheating
• Manywaystocheat,willpresentthree
Cheating#1:ACK-splitting(receiver)
• TCPRule:growwindowbyoneMSS
foreachvalidACKreceived
• SendM(distinct)ACKsforoneMSS
• GrowthfactorproportionaltoM
RTT
Data 1:1461
Data 1461:2921Data 2921:4381Data 4381:5841Data 5841:7301
ACK 486
ACK 973
ACK 1461
Cheating#2:IncreasingCWNDFaster(source)
• TCPRule:increasewindowbyoneMSSforeachvalidACKreceived
• IncreasewindowbyMperACK
• GrowthfactorproportionaltoM
Cheating#3:OpenManyConnections(source/receiver)
• Assume
• Astart10connectionstoB• Dstarts1connectiontoE• Eachconnectiongetsaboutthesamethroughput
• ThenAgets10timesmorethroughputthanD
A Bx
D Ey
Cheating
• Eithersenderorreceivercanindependentlycheat!
• Whyhasn’tInternetsufferedcongestioncollapseyet?
• Individualsdon’thackTCP(notworthit)• CompaniesneedtoavoidTCPwars
• Howcanwepreventcheating• VerifyTCPimplementations
• Controllingendpointsishopeless
• Nobodycares,really
AnyQuestions?
HowDoYouSolveTheseProblems?
• BiasagainstlongRTTs
• Slowtorampup(forshort-flows)
• Cheating
• Needforuniformity