transport layer - universitetet i oslo · 3 inf3190 - data communication sliding window / static...
TRANSCRIPT
1
Transport layer
TPC Congestion Control
INF3190 - Data Communication
TCP Congestion Control Slow Start
• Until Congestion Window size’s threshold is reached, send two new segmentsfor each one that is ACK’d
Additive Increase (AI)• Allow one more segment per RTT when all were ACK’d in the previous RTT
Multiplicative Decrease (MD)• Halve Congestion Window size’s threshold when at least one segment was lost
in the previous RTT Retransmission Timeout
• A segment hasn’t been ACK’d within the RTT and a bit?• MD and Slow Start
Fast Retransmit• Received 3 ACKs for the same segment?• MD but no Slow Start - start at the new Congestion Window size
QUESTION: What unfortunate assumption does TCP make?
2
Transport layer
Flow Control: Generic approaches
INF3190 - Data Communication
Flow Control on Transport Layer
Transport Layer’s Flow controlmechanisms
• Sliding window / static buffer allocation• Sliding window / no buffer allocation• Credit mechanism / dynamic buffer
allocation
NetworkLayer
TransportLayer
ApplicationLayer
Data LinkLayer
PhysicalLayer
Fast sender shall not flood slow receiver
Link layer• Few connections (one per host)• Short distances
Transport layer• Potentially many connections (several per application)• Potentially long distances
3
INF3190 - Data Communication
Sliding Window / Static Buffer Allocation Flow control
• Sliding window - mechanism with window size w
Buffer reservation• Receiver reserves 2*w buffers per duplex connection
Characteristics• Receiver can accept all packets• Buffer requirement proportional to number of connections• Poor buffer utilization for low throughput connections
i.e.⇒Good for traffic with high throughput
(e.g. data transfer)⇒Poor for bursty traffic with low throughput
(e.g. interactive applications)
INF3190 - Data Communication
Sliding Window / No Buffer Allocation Flow control
• Sliding window (or no flow control)
Buffer reservation• Receivers do not reserve buffers• Buffers allocated out of shared buffer space upon arrival of packets• Packets are discarded if there are no buffers available• Sender maintains packet buffer until ACK arrives from receiver
Characteristics• Optimized storage utilization• Possibly high rate of ignored packets during high traffic load
i.e.• => Good for bursty traffic with low throughput• => Poor for traffic with high throughput
4
INF3190 - Data Communication
Credit Mechanism Flow control
• Credit mechanism
Buffer reservation• Receiver allocates buffers dynamically for the connections• Allocation depends on the actual situation
Principle• Sender requests required buffer amount• Receiver reserves as many buffers as the actual situation permits• Receiver returns ACKs and buffer-credits separately
ACK: confirmation only (does not imply buffer release) CREDIT: buffer allocation
• Sender is blocked when all credits are used up
INF3190 - Data Communication
Credit Mechanism Dynamic adjustment to
• Buffer situation• Number of open connections• Type of connections
high throughput: many buffers low throughput: few buffers
5
INF3190 - Data Communication
Credit Mechanism
Sender Receiver
time time
<req 8 buffers>
<cred=4>
<seq=0, data=m0>
<seq=1, data=m1>
<seq=2, data=m2>
<ack=1, cred=3>
<seq=3, data=m3>
<seq=4, data=m4>
<seq=2, data=m2>
<ack=4, cred=0>
<ack=4, cred=1>
<ack=4, cred=2>
<seq=5, data=m5>
<seq=6, data=m6>
<ack=6, cred=0>
<ack=6, cred=4>
A wants 8 buffers
A has 3 buffers left
A has 2 buffers left
Message lost but A thinks it has 1 left
A has 1 buffer left
A has 0 buffer left, must stop
A times out and retransmits
A still blocked
A may now send next msg.
A has 1 buffer left
A is now blocked again
A still blocked
B grants messages 0-3 only
Message lost
B acknowledges 0 and 1permits 2-4
Everything acknowledgedbut no free buffers
B found a new buffersomewhere
A has 1 buffer left
A is now blocked again
Transport layer
Flow Control: TCP
6
INF3190 - Data Communication
TCP Flow Control Variable window sizes (credit mechanism)
• Implementation The actual window size is also transmitted with each acknowledgement Permits dynamic adjustment to existing buffer
Destination AddressSource address
Time to live Protocol Header checksumIdentification D M Fragment offset
Version IHL Type of service Total lengthPRE ToS
Data
OptionsSource port Destination port
Sequence numberPiggyback acknowledgement
THL
THL – TCP header lengthU: URG – urgentA: ACK – acknowledgementP: PSH – pushR: RST – resetS: SYN – syncF: FIN – finalize
F WindowSRPAUunusedChecksum Urgent pointer
Options (0 or more 32 bit words)
Sequence number andacknowledgements
Credit
INF3190 - Data Communication
TCP Flow Control• SEQUENCE NUMBER
• Number of transmitted bytes (each byte of the “message” is numbered)
• ACKNOWLEDGEMENT• Byte number referring to the acknowledgement, specifies next byte expected
• FLAGS• ACK
• AckNo is valid (if ACK=0 then no acknowledgment in segment)
• WINDOW SIZE• Buffer size in bytes• Used for variable-sized sliding window• Window size field indicates #bytes sender may transmit beginning with byte
acknowledged• Window size = 0 is valid
• Bytes up to ACK-1 received but receiver does not want more data at the moment
7
INF3190 - Data Communication
TCP Flow Control “Sliding window” mechanism
• Sequence number space is limited• No buffers longer than 232 possible• No credit larger than 216 possible
• Acknowledgement and sequence number Acknowledgments refer to byte positions
• Not to whole segment Sequence numbers refer to the byte position of a TCP connection
• Positive acknowledgement Cumulative acknowledgements
• Byte position in the data stream up to which all data has been receivedcorrectly
• Reduction of overhead• More tolerable to lost acknowledgements
• Window Scale Option - RFC 1323
INF3190 - Data Communication
TCP Flow Control
Sender Receiver
time time
2K / SEQ=0
ACK=2048 WIN=2048
2K / SEQ=2048
ACK=4096 WIN=0
ACK=4096 WIN=2048
1K / SEQ=4096
Application doesa 2K write
Application doesa 2K write
Sender could send 2Kbut application does
only a 1K write
Sender is blocked
Sender is blocked
4K buffer
2K
Application reads 2K
2K1K
2K
8
INF3190 - Data Communication
TCP Flow Control: Special Cases Optimization for low throughput rate
• Problem Telnet (and ssh) connection to interactive editor reacting on every keystroke 1 character typed requires up to 162 byte
• Data: 20 bytes TCP header, 20 bytes IP header, 1 byte payload• ACK: 20 bytes TCP header, 20 bytes IP header• Echo: 20 bytes TCP header, 20 bytes IP header, 1 byte payload• ACK: 20 bytes TCP header, 20 bytes IP header
• Approach often used Delay acknowledgment and window update by 500 ms (hoping for more data)
Nagle’s algorithm, 1984• Algorithm
Send first byte immediately Keep on buffering bytes until first byte has been acknowledged Then send all buffered characters in one TCP segment and start buffering again
• Comment Effect at e.g. X-windows: jerky pointer movements Nagle can be good and bad - choose depending on your needs
INF3190 - Data Communication
TCP Flow Control: Special Cases
Silly window syndrome (Clark, 1982)• Problem
Data on sending side arrives in large blocks But receiving side reads data at one byte at a time only
• Clark’s solution Prevent receiver from sending window update (new credit) for 1 byte Certain amount of space must be available in order to send window update
• min(X,Y)X = maximum segment size announced during connection establishmentY = buffer / 2
Senderwindow size=0
Header
Header
1 byte
Receive buffer full
Room for one byte
Receive buffer full
New byte arrives
Window update segment sent
Application reads one byte
9
Transport Protocols: SCTP
INF3190 - Data Communication
Stream Control Transmission Protocol (SCTP) Stream Control Transmission Protocol
• RFC2960, IETF Standards Track• RFC2719, Architectual Framework for Signaling Transport• SCTP Unreliable Data Mode Extension (draft-ietf-tsvwg-usctp-00.txt)
Initial goal• Signaling protocol for SS7 transport over IP networks
Protocol of the telephony world for IP telephony
• Players Motorola, Cisco, Siemens, Nortel Networks, Ericsson, Telcordia, UCLA,
ACIRI
Goal now: A new universal transport protocol• Competition or even replacement for TCP
10
INF3190 - Data Communication
SCTP Motivation• TCP too limited for some applications
• E.g., transport signaling from PSTN networks (SS7) over IP-based networks
• Examples• TCP: Strict order-of-transmission delivery of data with multiple streams
⇒ Partial order within a stream of multiplexed streams sufficient
• Stream-orientation of TCP inconvenient⇒ Application must set record markings⇒ Better: message-orientation
• TCP cannot deal with multi-homing• I.e., one server with several IP addresses
• TCP is vulnerable to DoS attacks• E.g., SYN flooding
• Many connect requests (SYN), but SYN/ACK response never acknowledged
INF3190 - Data Communication
SCTP Features Features
• Connection-oriented• Message-oriented• Fully ordered, unordered, partially ordered delivery• Reliable (with extension also: unreliable, partially reliable)• Multi-homed
Deadlines• Applications can give deadline for packet sending• Once sent, delivery is guaranteed (retransmission)
Delivery• Sender can specify
unordered delivery per packet
11
INF3190 - Data Communication
SCTP Features Reliability
• Sender and receiver window arecomputed Per stream In bytes Changed per stream as in TCP
• Acknowledgement using selectiveACK SACK chunks Are piggybacked Contain ranges of packets
• Retransmission Not before 4th missing report Always before new packets
Unreliable transport – V1• Proposed extension• Max. number of retransmissions
Specified per stream 0 is legal
• Ordered and unreliable is possible
Unreliable transport – V2• Proposed extension• Sender demands ACKs
Receiver must ACK Even if packets were not received
INF3190 - Data Communication
Association and Streams Connection-oriented
Concept of ‘association‘ Bi-directional
No one-way teardown of connections
Generalization of TCP-connections Each association endpoint can have several IP addresses (multi-homing)
Association
Endpoint 1 Endpoint 2
IP addressesassociated
with EP1
IP addressesassociatedwith EP2
Primaryaddress
Secondaryaddresses
12
INF3190 - Data Communication
Association and Streams
Association
Stream A
Stream B
Stream C
Connection-oriented Concept of ‘association‘
Bi-directional No one-way teardown of connections
Generalization of TCP-connections Each association endpoint can have several IP addresses (multi-homing) Each association can contain several streams (multi-streaming)
Stream: sequence of user messages to be delivered in order(up to 216 per direction)⇒ in contrast to the notion of ‘stream‘ of TCP
INF3190 - Data Communication
Association and Streams
Association
Stream A
Stream B
Stream C
1 2 3
1
21
2 3 1 2 1 12 3 1 2 1 1
231 21 1
Strict order
Partial order
Arrival at the receiver
Reliable data transfer• Confirmed, no duplicates, error-free
Strictly ordered delivery optional• Strict order: keep order within and among
streams of an association• Partial order: keep order within a stream of
an association
Effects• Strict order: data transmission stalled if
one stream is stalled• Partial order: transmission for non-
stalled streams can continue
Example: HTTP with multipleembedded files (images) Order of arrival of image data not
relevant
13
INF3190 - Data Communication
Further SCTP Concepts• Message segmentation according to path-MTU
• Path-MTU• maximum transfer unit supported on the path between the endpoints
• Path-MTU discovery mechanism as specified in RFC 1191
• Test whether the communication partner is alive: Heartbeats
• Flow control and congestion control similar to TCP (Selective Ack,...)• Coexistance with TCP
• Security means• 32-bit checksum (Adler-32, CRC-32 under discussion)• 4-way handshake using cookies against DoS attacks
INF3190 - Data Communication
4–way Handshake
• No half-open states as in TCP• No state information kept at the station receiving the ‘INIT‘ message
• No vulnerability for SYN flooding• State information established only after the third step,
the ‘COOKIE‘ message• To increase efficiency
• User data can be sent already with the ‘COOKIE‘ and ‘COOKIE ACK‘ messages
time time
INIT
INIT ACK (with cookie)
COOKIE ECHO
COOKIE ACK
State
Established
Cookie sent
Cookie wait
Closed Closed
Established
Start timer
Stop timer
Start timer
Stop timer
Generate cookie
Verify cookie
Action Action State
14
INF3190 - Data Communication
Message format• Multiplexing of several user messages: Chunk
Bundling• Chunk: part of an SCTP packet belonging to a single
stream• User can request unbundled chunks
⇒ Chunks are sent separately if the send queue is empty
• Common Header• Source port / destination port (2 Byte each): As in TCP
or UDP• Verification tag: for validation of the sender of the
SCTP message• Protection against blind attacks (unauthorized shutdown of
association)
• Checksum• Adler-32: currently proposed in the RFC• CRC-32: proposed recently, better error detection
properties for small packets
Common Header
…
Chunk 1Chunk 2
Chunk n
Source portVerification Tag
Checksum
Destination port
INF3190 - Data Communication
Chunk Format
• General format of a chunk• Chunk Type
• Type of information in the ChunkValue field
• E.g., type=0: payload data; type=1:INIT message,...
• Chunk Flags• Depend on the chunk type
• Chunk Length• Length of the chunk in bytes
• Including type, flags, length
Chunks• DATA• SACK• INIT• ABORT• SHUTDOWN• HEARTBEAT• ERROR• ECNE
Explicit Congestion Notification Echo
• CWR Congestion Window Reduced
• & responses
Chunk Type Chunk Flags Chunk Length
Chunk Value
15
INF3190 - Data Communication
INIT Chunk Format
• Initiate Tag: random number used in all subsequent messages• Protect against blind attacks
• Advertised Receiver Credit Window• Dedicated buffer space reserved for the association
• Number of outbound streams• The sender of this INIT chunk wants to open
• Number of inbound streams• Maximum the sender of this INIT message can support
• Variable-Length parameter• A.o.: list of IP addresses (multi-homing!) being part of the association
Initiate TagType=1 Flags: None Chunk Length
Number of outbound streamsInitial transmission sequence number
Advertised receiver credit windowNumber of inbound streams
Optional/variable lengthparameters
INF3190 - Data Communication
DATA Chunk Format
• Example: Format of a DATA chunk (i.e., non SCTP control message)• Flags (one bit each)
• Unordered bit: ’1’ indicates unordered DATA chunk, ’0’ an ordered DATA chunk• Beginning bit: ’1’ first fragment of a user message• End bit: ’1’ last fragment of a user message
• Transmission Sequence Number (32 bits) => 4,300 Mio numbers• Unique per association• Used for acknowledgments and duplicate recognition• Used for retransmission• Not used for re-ordering• One per data chunk (also in case of fragmentation)
Transmission Sequence Number (TSN)Type=0 Flags: U|B|E Chunk Length
16
INF3190 - Data Communication
DATA Chunk Format
• ...• Stream Identifier (16 bits)• Stream Sequence Number (16 bits)
• Unique per stream• Used to assure sequenced delivery within a stream• Separate acknowledgement mechanism from sequenced delivery• Same SSN if packet is fragmented into several chunks
• Payload Protocol Identifier (e.g., RTP)• User Data: the actual user data to send via SCTP
Transmission Sequence Number (TSN)Type=0 Flags: U|B|E Chunk Length
Stream identifier S
User Data(seq n of stream S)
Payload Protocol IdentifierStream sequence number n
Transport Protocols: DCCP
17
INF3190 - Data Communication
Datagram Congestion Control Protocol (DCCP) Datagram Congestion Control Protocol
• Under development• http://www.ietf.org/html.charters/dccp-charter.html
Transport Protocol• Offers unreliable delivery• Low overhead like UDP• Applications using UDP can easily change to this new protocol
Accommodates different congestion control mechanisms• Congestion Control IDs (CCIDs)
Add congestion control schemes on the fly Choose a congestion control scheme TCP-friendly Rate Control (TFRC) is included
• Half-Connection Data Packets sent in one direction
Congestion Avoidance
18
INF3190 - Data Communication
Avoidance Principle
• Appropriate communication system behavior and design
Policies at various layers can affect congestion• Data link layer
Flow control Acknowledgements Error treatment / retransmission / FEC
• Network layer Datagram (more complex) vs. virtual circuit (more procedures available) Packet queueing and scheduling in router Packet dropping in router (including packet lifetime) Selected route
• Transport layer Basically the same as for the data link layer But some issues are harder (determining timeout interval)
INF3190 - Data Communication
Avoidance by Traffic Shaping Motivation
• Congestion is often caused by bursts• Bursts are relieved by smoothing the traffic (at the price of a delay)
ProcedureNegotiate the traffic contract beforehand (e.g., flow specification)The traffic is shaped by sender
Average rate andBurstiness
AppliedIn ATMIn the Internet (“DiffServ” - Differentiated Services)
Smoothed streamPeak rate
Original packet arrival
time
19
INF3190 - Data Communication
Traffic Shaping with Leaky Bucket Principle
• Continuous outflow• Congestion corresponds to data loss
Described by• Packet rate• Queue length
Implementation• Easy if packet length stays constant
(like ATM cells)
Outputlines
Inputlines
Symbolic:bucket with
outflow per time
Implementationwith limited buffers
INF3190 - Data Communication
Traffic Shaping with Token Bucket Principle
• Permit a certain amount of data toflow off for a certain amount of time
• Controlled by "tokens“• Number of tokens limited• Number of queued packets limited
Implementation• Add tokens periodically
Until maximum has been reached)• Remove token
Depending on the length of the packet(byte counter)
Comparison• Leaky Bucket
Max. constant rate (at any point intime)
• Token Bucket Permits a limited burst
20
INF3190 - Data Communication
Traffic Shaping with Token Bucket Principle
• Permit a certain amount of data toflow off for a certain amount of time
• Controlled by "tokens“• Number of tokens limited• Number of queued packets limited
Implementation• Add tokens periodically
Until maximum has been reached)• Remove token
Depending on the length of the packet(byte counter)
Comparison• Leaky Bucket
Max. constant rate (at any point intime)
• Token Bucket Permits a limited burst
packet burst
INF3190 - Data Communication
Traffic Shaping with Token Bucket Principle
• Permit a certain amount of data toflow off for a certain amount of time
• Controlled by "tokens“• Number of tokens limited• Number of queued packets limited
Implementation• Add tokens periodically
Until maximum has been reached)• Remove token
Depending on the length of the packet(byte counter)
Comparison• Leaky Bucket
Max. constant rate (at any point intime)
• Token Bucket Permits a limited burst
21
INF3190 - Data Communication
Avoidance by Reservation: Admission Control
Principle• Prerequisite: virtual circuits• Reserving the necessary resources (incl. buffers) during connect• If buffer or other resources not available
Alternative path Desired connection refused
Example• Network layer may adjust routing based on congestion• When the actual connect occurs
A
B
A
B
INF3190 - Data Communication
Avoidance by Reservation: Admission Control
Sender oriented• Sender (initiates reservation)
Must know target addresses (participants) Not scalable Good security
1. reserve
2. reserve
3. reserve
receiver
sender
data flow
22
INF3190 - Data Communication
Avoidance by Reservation: Admission Control
Receiver oriented• Receive (initiates reservation)
Needs advertisement before reservation Must know “flow” addresses
• Sender Need not to know receivers More scalable Insecure
1. reserve
2. reserve
3. reserve
receiver
sender
data flow
INF3190 - Data Communication
Avoidance by Reservation: Admission Control
Combination?
• Start sender oriented reservation
receiver
sender
data flow
reserve from nearest router
1. reserve
2. reserve
3. reserve
23
INF3190 - Data Communication
Avoidance by Buffer Reservation Principle
• Buffer reservation Implementation variant: Stop-and-Wait
protocol• One buffer per router and connection
(simplex, VC=virtual circuit) Implementation variant: Sliding Window
protocol• m buffers per router and (simplex-)
connection Properties
• Congestion not possible• Buffers remain reserved,
Even if there is no data transmission forsome periods
• Usually only with applications that requirelow delay & high bandwidth
1
2
3
unreservedbuffers
rsvd forconn 1
1
2
3
INF3190 - Data Communication
Avoidance: combined approaches Controlled load
• Traffic in the controlled load class experiences the network as empty• Traffic class for the IntServ/RSVP reservation mechanism
Approach• Allocate few buffers for this class on each router• Use admission control for these few buffers
Reservation is in packets/second (or Token Bucket specification) Router knows its transmission speed Router knows the number of packets it can store
• Strictly prioritize traffic in a controlled load class
Effect• Controlled load traffic is hardly ever dropped• Overtakes other traffic
24
INF3190 - Data Communication
Avoidance: combined approaches Expedited forwarding
• Very similar to controlled load• A differentiated services PHB (per-hop-behavior) for the DiffServ mechanisms
Approach• Set aside few buffers for this class on each router• Police the traffic
Shape or mark the traffic Only at senders, or at some routers
• Strictly prioritize traffic in a controlled load class
Version IHL DSIdentification
Total lengthD M Fragment offset
Time to live Protocol
Destination AddressSource address
Header checksum
0 0001 1 1 1
EffectShapers drop excessivetrafficEF traffic is hardly everdroppedOvertakes other traffic