reliable stream transport service (tcp)

93
Reliable Stream Reliable Stream Transport Service Transport Service (TCP) (TCP) Chapter 12 Chapter 12

Upload: abra-collier

Post on 02-Jan-2016

41 views

Category:

Documents


3 download

DESCRIPTION

Reliable Stream Transport Service (TCP). Chapter 12. We’ve looked at Unreliable connectionless packet delivery service And the IP protocol that defines it Now we will examine Reliable stream delivery And the Transmission Control Protocol that defines it - PowerPoint PPT Presentation

TRANSCRIPT

  • Reliable Stream Transport Service (TCP)Chapter 12

  • Weve looked atUnreliable connectionless packet delivery serviceAnd the IP protocol that defines itNow we will examineReliable stream deliveryAnd the Transmission Control Protocol that defines itTCP is presented as a part of TCP/IPIs independent, general purpose protocolCan be adapted for use with other delivery systems

  • Need for Stream DeliveryAt low levels, have unreliable packetsLost, destroyed, discarded, duplicated, delayedSize constraints affect efficient transferApplications need to send lots of dataUnreliability is tedious and annoyingProgrammers must worry about errorsGoal of network protocol researchGeneral purpose reliable stream delivery method

  • Properties of the ServiceInterface between applications and TCP/IP has five characteristic features:Stream OrientationSender provides stream of bits divided into bytesReceiver is passed exact same sequenceVirtual Circuit ConnectionService provides illusion of dedicated circuitCall setup from one application to the otherTwo OSs talk and settle detailsContinue to communicate during transferIf error, detect and report to applications

  • Buffered TransferApplications send stream in whatever size it wantsMay be as small as a single octetProtocol software wants efficient transferSmall blocks of data: buffer until get enough for a datagramLarge blocks of data: break into smaller piecesPush mechanismWhen transfer needs to happen before buffer is fullApplication invokes a pushData generated until then is sent immediatelyAt receiving end, is delivered without delayProtocol software may divide stream in unexpected ways

  • Unstructured StreamApplications cannot mark record boundariesMust agree that stream service will be unstructuredFull Duplex ConnectionConnections allow concurrent transfer both waysAppears as two independent streams in opposite directionsCan terminate one direction without affecting otherControl information can be piggybacked on data

  • Providing ReliabilityWant reliable transfer out of unreliable packet delivery systemMost reliable protocols use a single techniquePositive acknowledgement with retransmissionRecipient must send ACK message as it gets dataSender keeps record of each packet sentIf timer expires for an ACK, retransmits packet

  • Figure 12.1

  • Can also have duplicate packetsNetwork delays may cause premature retransmissionBoth packets and ACKs can be duplicatedUsually solve by assigning sequence numbersReceiver must remember which sequence numbers receivedACKs include the sequence numbers as well

  • Sliding WindowsSending one packet and waiting for ACK wastes timeFull duplex circuit; have lots of idle timeSliding window technique usedMore complex form of positive ack & retransUse bandwidth more efficientlySender transmits multiple packets before ACK

  • Number of unacknowledged packets limited by window sizePerformance depends upon window sizeSize of 1: same as simple positive ack protocolIncrease size with goal of sending packets as fast as the network can handleConceptually, separate timer for each packetOnly unacked packets are retransmittedReceiver has a similar window

  • TCPIs a communication protocolNOT a piece of softwareTCP is the standardVarious TCP software implements the standardStandard includes:Format of data and acknowledgmentsProcedures for reliabilityDistinguish multiple destinations on a machineError recovery proceduresInitiation and closing a TCP stream transfer

  • Standard does not include:Details of application/TCP interfaceNot discuss exact procedures to invoke for operationsNot specified for flexibilityTCP usually implemented in OSCan use whatever interface given OS providesSingle specification for variety of machinesTCP assumes little about underlying systemCan be used with variety of packet delivery systems (including IP)Dialup lines; LAN; high speed fiber; low speed WAN

  • Ports, Connections, & EndpointsTCP resides above IP in the layering scheme

    ApplicationReliable Stream (TCP)User Datagram (UDP)Internet (IP)Network Interface

  • Multiple applications can communicate concurrentlyMultiplexes and demultiplexes incoming msgsUses port numbers (like UDP discussion)TCP ports more complexUsing the connection abstractionObjects are virtual circuits, not portsConnections identified by a pair of endpointsEndpoint is pair of integers: (host, port)host is IP address for a hostport is TCP port on that host

  • Pair of endpoints defines connection(128.9.0.32, 1184) and (128.10.2.3, 53)A single TCP port can be shared by multiple connections on the same machine(128.2.254.139, 1012) and (128.10.2.3, 53)No ambiguityIncoming messages associated with connection, not portBoth endpoints used to identify appropriate connectionMakes things easier for programmersCan provide concurrent service without unique portsExample: EmailMultiple computers can send mail concurrentlyAccepting program needs only one TCP port

  • Passive & Active OpensTCP is connection-orientedBoth endpoints must agree to participatePassive openApplication at one end tells OS it will accept connectionOS assigns a TCP port number for its endActive openDone by application wishing to connectTells OS to establish a connectionTwo TCP modules communicateEstablish and verify the connection; then pass data

  • Segments, Streams, & Sequence NumbersTCP views the data stream in segmentsSegment contains sequence of octetsUsually each segment in one IP datagramTwo important problems:Efficient transmissionGood use of available networkFlow controlEnd-to-end problemCannot overflow the receivers buffer

  • Special sliding window protocol used Solves both problems

    Octets of the data stream are numbered sequentially1st pointer: sent and ACKed vs sent and not ACKed2nd pointer: end of window3rd pointer: boundary between sent and unsent 1 3 2

    Current window 1234567891011

  • Receiver maintains a similar windowFull duplex: SW at each end maintains 2 windowsAlso allows window size to vary over timeEach ACK has window advertisementTells how many more octets willing to acceptIncreased advertisement:Sender can increase size of sliding window, send moreDecreased advertisement:Sender decreases size of sliding window, stop at boundaryExtreme case: sends advertisement of zero, stops all

  • This provides flow controlEssential in internet environmentTwo independent flow problems:End-to-endMinicomputer communicating with mainframeIntermediate systemsRouters need to control flow, tooOverloaded router condition is congestionNo explicit congestion control mechanism; uses sliding windowGood TCP implementation can detect & recoverPoor implementation can make it worse

  • TCP Segment FormatUnit of TCP/IP sw transfer is segmentEstablish connectionsTransfer dataSend ACKsMay piggyback on a segment carrying dataAdvertise window sizeClose connections

  • Figure 12.7

  • Code Bits field reveals type of segment

    Bit (left to right)Meaning if bit set to 1URGUrgent pointer field is validACKAcknowledgement field is validPSHSegment requests a pushRSTReset the connectionSYNSynchronize sequence numbersFINSender has reached end of its byte stream

  • Out of Band DataOut of BandData sent without waiting for octets in the stream to be consumed by the receiverEx: to interrupt or abort a programUse urgent bit and URGENT POINTER fieldThis data is consumed first, regardless of stream position

  • Maximum Segment Size OptionNot all segments will be of same sizeBut, must agree on a maximum sizeUses OPTIONS fieldCan specify MSS (maximum segment size)If on same network, may use size such that resulting datagrams match network MTUIf not, will attempt to discover the minimum MTU along the pathOr use 536 (default datagram size, minus IP & TCP headers)

  • Choosing good MSS is difficultToo large or too small are both badToo small: network utilization is lowSegments in datagram; datagram in frameAt least 40 octets of headersSmall amount of data gives poor utilizationToo large: large IP datagramsProbably get fragmented somewhereCannot ACK partial segmentMust receive all fragmentsMore fragments increases probability of losing one

  • In theory, best MSS is when IP datagrams are as large as possible without being fragmentedDifficult to figure out:Most implementations do not have a mechanism for doing soRoutes can change dynamicallyThis may change the MTU of the pathOptimum size depends on lower level headersSegment size must be reduced to account for IP options

  • Window Scaling OptionWINDOW field is 16 bitsLimits max window size to 64 KbytesOk in early networksNeed more for networks with large delayOption allows a larger sizeDo not need to know details.

  • Timestamp OptionUsed to:Help compute delay on underlying networkHandle wrap around sequence numbersProcess:Sender:Places timestamp from its clock in messageReceiver:Copies timestamp field into ackAllows sender to compute elapsed time

  • TCP ChecksumCHECKSUM contains 16-bit integerUses a pseudo header like UDPPurpose is just the sameVerify segment has reached correct destination

    0 8 16 31Source IP AddressDestination IP AddressZeroProtocolTCP Length

  • ACKs & RetransmissionHard to refer to datagrams or segmentsVariable length segmentsRetransmitted segments may have more data than originalInstead, use position in streamBased on sequence numbers

  • Cumulative acknowledgement schemeReceiver collects arriving data octetsReconstructs stream of senderMay have to reorder segments due to deliveryWill have reconstructed zero or more octetsMay have other stream pieces present but out of orderReceiver ACKs longest contiguous prefixACK specifies the next octet expected to be receivedAdv:ACKs easy to generate and unambiguousLost ACKs may not force retransmissionDisadv:Only send info about single position in the stream

  • Lack of information is inefficientImagine window that spans 5000 octetsStarts with position 101 in the streamSender has sent all data in five segmentsSuppose first segment got lostReceiver sends ACK as each segment arrivesAll ACKs specify octet 101 as next expectedNo way to tell sender that all the other data is thereSender has two choices upon timeout:Send all five segments overSend only first segment, then wait for ACK to do anything else

  • Timeout and RetransmissionTCP has a timer for each segmentIf timer goes off before ACK received retransDifferent algorithm than other protocolsDue to internet environmentCannot know how quickly ACKs should comeMay span one or many networksMay encounter router delaysMust accommodate vast time differences

  • Figure 12.10

  • Adaptive Retransmission AlgorithmUsed to accommodate varying delaysMonitors performance of each connectionDeduces reasonable values for timeoutsAs performance changes, timeout value revisedMust collect data for the algorithmRecords time each segment sent & when ACK arrivesComputes elapsed time (sample round trip time)Get new sample; adjust average round trip time for the connectionRTT stored as weighted average (usually)New round trip samples change the average slowly

  • Example:

    RTT = (a * Old_RTT) + ((1-a) * New_Round_Trip _Sample)where: a is the constant weighting factor; 0 < a < 1

    Choosing a value close to 1:Weighted average only changed small amountImmune to changes that last a short timeChoosing a value close to 0:Weighted average responds quickly to changes in delay

  • Timeout value is a function of the current RTTEarly implementations used constant weighting factor, B (B > 1)Timeout = B * RTTChoosing a value for B is hardClose to 1Timeout close to current RTTDetects packet loss quicklyAny small delay may cause unnecessary retransmissionsOriginal specification recommended B=2Will look at better techniques for timeout

  • Measuring Round Trip SamplesMeasuring round trip sample seems trivialBut, TCP uses cumulative acknowledgementACK refers to data received, not datagram that carried itConsider a retransmission:Form segment; put in datagram; send; timer expiresSend again in second datagramGet ACK: for which datagram?Called acknowledgement ambiguity

  • Assume ACK belongs to earliest datagramMake estimated round trip time growIncorrect if the original datagram was really lostIf many lost, estimate grows arbitrarily largeAssume ACK belongs to latest datagramSend retransmission just before ACK arrivesDecreases the timeout timeMakes things worse; more retransmissionsEstimate will eventually stabilizeRTT will be slightly less than of the correct valueEvery segment sent twice even though no loss occurs

  • Karns AlgorithmIf associating ACK with earliest or most recent are both wrongwhat to do?Do not update on retransmitted segmentsIdea known as Karns AlgorithmAvoids ambiguous acknowledgement problemSimplistic implementation can be a problemGet sharp increase in delay; do some retransmissionsIgnore ACKs for retransmissions; no new estimate

  • Must also use a timer backoff strategyCompute initial timeout with round trip estimateIf timer expires and causes retransmission, increase the timeout (within a bound)Most implementations multiply timeout by 2Next segment timed with new timeoutContinues backoff until send segment without retransmittingComputes new round trip estimateResets timeout accordinglyShown to work well even with high packet loss

  • High Variance in DelayComputations do not respond well to wide range of variation in delayVariation in RTT Proportional to 1/(1-network load)Original TCP standard estimated RTT as shown earlierLimiting B to 2 can adapt to loads of at most 30%1989 spec requires estimates of both average RTT and varianceMust use variance in place of constant B

  • Approximations are computationally easyDIFF = SAMPLE Old_RTTSmoothed_RTT = Old_RTT + d * DIFFDEV = Old_DEV + p (|DIFF| - Old_DEV)Timeout = Smoothed_RTT + e * DEVWhere:DEV is the estimated mean deviationd is fraction between 0 & 1; controls effect on weighted averagep is fraction between 0 & 1; controls effect on mean deviatione is a factor controlling how much deviation effects RT timeout

    (Research suggests d and p to be inverse power of 2; scales by 2n, uses integer arithmetic, and:d = 1/(23), p = 1/(22), n = 3, and e = 4 )

  • Figure 12.11

  • Figure 12.1212.10,

  • Response to CongestionTCP software must deal with congestionSevere delay caused by an overload of datagramsCongestion occurs at routersRouters have finite storageWhen run out of storage, start dropping datagramsEndpoints do not know where congestion isJust see increased delayGet timeouts; send more datagrams (retrans)May cause congestion collapse

  • TCP must reduce transmission rateICMP source quench messages inform hosts of congestionTCP needs to helpWant to automatically reduce transmission rates when congestion occursTCP standard recommends two techniquesSlow-startMultiplicative Decrease

  • Multiplicative DecreaseTCP must already use receivers window sizeKeep another window size to use during congestionCalled congestion windowAt any time, the allowed window is:min(receiver_advertisement, congestion_window)During non-congestion, both are sameTo estimate congestion window size, TCP assumes most datagram loss comes from congestionUpon segment loss:Reduce congestion window by half (min of one segment)For segments still in window, backoff timer exponentiallyDoes for every loss; quickly clear router traffic

  • Slow-startHow recover when congestion ends?If do reverse (2x congestion window) - unstableUse slow-start recoveryWhen starting traffic on connection or after congestionStart window at size of single segmentIncrease by one segment every time get an ACKAvoids swampingNot so slow actually: Log2N round trips until can send N segmentsOne other restriction congestion avoidance phaseWhen congestion window reaches original size, increase by 1 segment only if all segments been ACKedOverall, known as Additive Increase Multiplicative Decrease (AIMD)

  • Techniques powerful when combinedSlow-start increaseMultiplicative decreaseAdditive IncreaseMeasurement of variationExponential timer backoffImprove TCP performance dramaticallyAdd very little computational overheadPerformance improves by factors of 2 to 10

  • Fast Recovery & Other ModificationsHeuristic used where loss is infrequentUses info from cumulative ack schemeCan resend data before timer expires

    Do not need to know details

  • Explicit Feedback MechanismsMost TCP versions use implicit techniques:Timeout and duplicate ACKs to detect lossChanges in RTT to detect congestionTwo explicit techniques have been proposedSelective Acknowledgement (SACK)Explicit Congestion Notification (ECN)

  • SACKCan specify exactly which data has been received and which is missingSender knows which segment(s) to retransmitTCP provides two options for SACK**Do not need to know details**Does not replace cumulative ack mechanismNor is it mandatory

  • ECNUsed to notify TCP about congestionAs a TCP segment goes through routers:Two bits in IP header used to record congestionWhen segment arrives, receiver knowsSender needs to know; receiver uses ACK to tellIP header bits:Taken from TOS fieldTCP header bits:Taken from reserved area

  • Congestion, Tail Drop, and TCPProtocols are layeredLayers operate in isolationTCP at source/destination cannot interact with lower layer elements along the pathTCP not know condition of networkTCP not notify lower layers before transferring dataPolicies used by routers can affect TCPBoth a single connection and aggregate of all connections

  • Example:Router delays some datagrams more than othersTCP backs off retransmission timerIf delay exceeds timer, TCP assumes congestionLayers are defined independently, but they interactThus, try to define mechanisms in one layer to work well with protocols in others

  • Important interaction between TCP and IPRouter overrun and begins to drop datagramsEarly router software used tail-drop policyIf input queue is full when datagram arrives, drop itInteresting effect on TCPIf segments are from a single TCP connection:TCP enters slow-start until begin receiving ACKsIf segments are from multiple TCP connections:All N instances of TCP enter slow-start at same timeCauses global synchronization

  • Random Early DetectionRouters need to avoid global synchronizationUse scheme to avoid tail-drop when possibleCalled Random Early Detection (RED)(or Random Early Discard or Random Early Drop)Uses two markers in queue: Tmin and TmaxThree rules:If queue contains fewer than Tmin datagrams, add new oneIf queue contains more than Tmax datagrams, discard new oneIf queue contains between Tmin and Tmax datagrams, randomly discard the datagram with probability p

  • Randomness keeps from waiting for overflowRouter slowly and randomly drops datagrams as congestion increasesKeeps from putting all TCP connection in slow-startKey is in choice of the thresholds and pTmin must be large enough to utilize output linkTmax must be larger than typical increase in queue size during round trip timeDiscard probability is most complex choiceNot use a constant; compute for each datagramCan vary probability from 0 (Tmin queue size) to 1 (Tmax queue size) in a linear fashion

  • Linear scheme forms the basis of probability pMust avoid overreacting to bursty trafficIf short burstDo not drop datagrams because queue will not overflowBut, cannot postpone discard indefinitelyLong burstWill overflow queue and start tail-dropUse weighted average techniqueNot use actual queue size at any instantCompute weighted average queue sizeUpdate each time a datagram arrivesAvg = (1 g) * Old_avg + g * Current_queue_sizewhereg is a value between 1 and 0

  • Some details glossed overComputations very efficient if:Choose constants as powers of 2Use integer arithmeticMeasurement of queue sizeTime required to forward datagram proportional to sizeMeasure queue size in octets versus datagramsAffects type of traffic droppedDiscard probability proportional to amount of dataNot based on number of segmentsSmaller datagrams: less probability of being droppedGood for ACKs, remote login traffic, etc.Analysis and simulation shows RED works

  • Establishing a TCP ConnectionUse a 3-way handshakeIs both necessary and sufficient for correct synchronizationAlso uses rule that additional requests for connection are ignored if connection establishedCan initiate connection from both ends simultaneously

  • Figure 12.13 The sequence of messages in a three-way handshake. Time proceeds down the page; diagonal lines represent segments sent between sites. SYN segments carry initial sequence number information.

  • Initial Sequence Numbers3-way handshake accomplishes 2 functionsGuarantees both sides ready to transfer dataSets up agreement on initial sequence numbersEach machine can choose initial number at randomCannot start at 1 each timeNumbers set in three messagesFirst machine: sends xSecond machine: records x, sends y and ACKs xFirst machine: ACKs y

  • Possible to send data with handshake segmentsIncluded with the initial sequence numbersTCP software must buffer until handshake doneOnce connection established, can release the data to the application program quickly

  • Closing a TCP ConnectionClose operation used to terminate gracefullyConnections are full duplexWhen application tell TCP it is done, TCP closes the connection in one directionSending TCP sends remaining dataWaits for receiver ACKSends segment with FIN bit setReceiver ACKs the FIN segment and informs its application that data is done

  • Can still send data in opposite directionWhen both directions closed, TCP deletes its record of the connectionModified 3-way handshake is used to close

  • Figure 12.14 The modified three-way handshake used to close connections. The site that receives the first FIN segment acknowledges it immediately, and then delays before sending the second FIN segment.

  • TCP Connection ResetClose operation used for normal shutdownSometimes abnormal conditions ariseForce the connection to be brokenTCP has a reset for such conditionsOne side sends segment with RST bit setOther side responds immediately by aborting connectionTCP informs application that connection was resetTransfer in both directions ceases immediately

  • TCP State MachineOperation of TCP can be explained with a theoretical model called finite state machineCircles represent statesArrows represent transitions between them

  • Figure 12.15

  • AB

  • Forcing Data DeliveryData stream usually bufferedAccumulate enough octets for efficient transferMay need to send data before get a lotExample: interactive terminal keystrokesPush operation forces delivery of octetsAlso sets PSH bit in segment code fieldCauses delivery of data to destination application

  • Reserved TCP Port NumbersCombines static and dynamic port bindingLike UDPMany of the port numbers are the same for services accessible by both TCP and UDPSee Figure 12.16

  • Figure 12.16

  • TCP PerformanceTCP is complex protocolHandles wide variety of underlying technologiesGenerality does not hinder TCP performanceResearch done at BerkeleyShows that same TCP that gives efficient internet operation can sustain 8 Mbps throughput between two stations on 10 Mbps EthernetCray Research: TCP thruput approaching Gps

  • Silly Window SyndromeTCP can have serious performance problemCaused when sender & receiver operate at different speedsIf receiver reads data one octet at a timeSender quickly fills bufferMust wait for window advertisementGets advertisement for one octetResults in many small segmentsInefficient use of bandwith and lots of overhead

  • If sender sends data one octet at a timeEnds up with same problemKnown as silly window syndromeEarly TCP implementations exhibited the problemEach ACK advertises small amount of spaceCauses each segment to carry a small amount of data

  • Avoiding Silly Window SyndromeTCP specs include heuristics to avoid SWSOn sender, avoids sending small data amountsOn receiver, avoids sending small advertisementsTCP software should contain both

  • Receive-side silly window avoidanceReceiver maintains currently available windowDelays advertising until can advance window a significant amountMinimum of of the receivers buffer, orNumber of octets in a maximum-sized segmentSummary of technique:Before sending an updated advertisement after advertising a zero windowWait for space50% of total buffer or maximum sized segment

  • Two approaches for implementationACK each arriving segment, but do not advertise until allowedDelay sending ACK if window too small to advertiseStandard recommends using delayed ACKsAdv: delayed ACKs decrease traffic, increase thruputOne ACK for all data received during delayMay get outgoing data segment to piggyback onIf data read quickly, ACK and adv can go in one segmentDisadv: May get retransmissions if delay too longBad round trip time estimatesCannot delay more than 500 msRecommend receiver ACK every other data segment

  • Send-side silly window avoidanceGoal is to avoid sending small segmentsUse clumpingDelay sending until get reasonable amount of dataHow long should TCP wait?Too long: application has large delaysCannot know when application will send more dataNot long enough: get small segmentsFixed delay not optimal for all applicationsUses an adaptive algorithmDelay depends on current internet performance

  • Does not compute delaysUses arrival of ACK to trigger transmission of additional packetsHeuristic:Application generates more data to sendBuffer if previous data sent but not ACKedWait until get enough for maximum-sized segmentIf waiting when ACK arrives, send all data in bufferApply rule even when push operation requestedIf application fast compared to networkSuccessive segments have many octetsIf application slow compared to networkSmall segments get sent without long delay

  • Known as the Nagle algorithmElegant due to little computational overheadAdapts to arbitrary combinations of: network delaymaximum segment sizeapplication speedBut does not lower throughput in normal cases

  • SummaryTCP defines reliable stream delivery serviceFull duplex connectionExchange large volumes of data efficientlySliding window gives efficient network useFew assumptions of underlying networkFlexible for wide variety of delivery systemsHas flow controlFlexible for systems with differing speeds

  • Basic unit of transfer is a segmentPass data or control informationPermits piggyback of ACKsFlow controlImplemented by receiver advertisementsUrgent facility supports out-of-band messagesPush mechanism forces delivery

  • TCP standard specifiesExponential backoff for retransmission timersCongestion avoidance algorithmsSlow-startMultiplicative decreaseAdditive increaseUses heuristics to avoid small packetsRecommends using RED versus tail-dropAvoids TCP synchronizationImproves throughput