lecture set 7

52
Communication Networks Sanjay K. Bose Lecture Set VII Transport Layer

Upload: gopi-saiteja

Post on 12-Jan-2017

530 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture set 7

Communication Networks

Sanjay K. Bose

Lecture Set VII

Transport Layer

Page 2: Lecture set 7

TCP/IP Protocol Suite

• TCP/IP protocol stack is a layered architecture. TCP and IP are the two most important protocols of this stack.

• It was originally developed by the DARPA (Defense Advanced Research Projects Agency ) for an experimental packet-switched network .

• It was later included in the Berkeley Software Distribution of UNIX.

• It maps closely to the OSI layers and it supports all standard physical and data link protocols.

• It also includes specifications for such common applications as e-mail, remote login, terminal emulation, and file transfer.

Page 3: Lecture set 7

HTTP SMTP RTP

TCP UDP

IP

Network Interface 1

Network Interface 3

Network Interface 2

DNS

TCP/IP Protocol Suite (Transport Layer)

(ICMP, ARP) Best-effort connectionless packet transfer

Variety of Network Technologies

Reliable stream service

User datagram service

Distributed applications

Page 4: Lecture set 7

Transport services and protocols

• Provide logical communication between application processes running on end hosts

• Transport protocols run only in end systems (not in routers/switches)

– Sender breaks application messages into segments, passes to network layer

– Receiver reassembles segments into messages, passes to application layer

• Transport protocols available are TCP and UDP

application

transport

network

data link

physical

application

transport

network

data link

physical

Network Layer: Logical communication between hosts Transport Layer: Logical communication between processes (relies on, and enhances, network layer services)

Page 5: Lecture set 7

TCP/IP Encapsulation

TCP Header contains source & destination port numbers for identifying the application

IP Header contains source and destination IP addresses; transport protocol type (TCP or UDP)

Ethernet Header contains source & destination MAC addresses

HTTP Request

TCP header

HTTP Request

IP header

TCP header

HTTP Request

Ethernet header

IP header

TCP header

HTTP Request

FCS

Example application : HTTP

Page 6: Lecture set 7

Transport Layer Multiplexing/Demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2 P3 P4 P1

Host 1 Host 2 Host 3

= process = socket

Delivering received segments to correct socket

Demultiplexing at Receiver: Gathering data from multiple sockets, enveloping data with header (later used for demultiplexing)

Multiplexing at Sender:

Page 7: Lecture set 7

Demultiplexing at the Transport Layer • Host receives IP datagrams

– Each datagram has source IP address, destination IP address

– Each datagram carries 1 transport-layer segment

– Each segment has source, destination port number

• Host uses IP addresses & port numbers to direct segment to appropriate socket

Source Port # Dest Port #

32 bits

Application Data

(Message)

other header fields

TCP/UDP Segment Format

0-255 Well-known ports

256-1023 Less well-known ports

1024-65536 Ephemeral client ports

Page 8: Lecture set 7

Connectionless Demultiplexing (UDP)

• Create sockets with port numbers:

DatagramSocket mySocket1 = new

DatagramSocket(12534);

DatagramSocket mySocket2 = new

DatagramSocket(12535);

• UDP socket identified by two-tuple

(dest IP address, dest port number)

• When host receives UDP segment: – checks destination port

number in segment

– directs UDP segment to socket with that port number

• IP datagrams with different source IP addresses and/or source port numbers directed to same socket if the destination port number is the same in the destination host

Note that TCP does this differently, using a 4-tuple (S-IP, SP, D-IP, DP) to identify a socket!

Page 9: Lecture set 7

Connectionless Demultiplexing (UDP)

Example: Server creates socket at 6428 to provide UDP service to some application DatagramSocket serverSocket = new DatagramSocket(6428);

Client IP:B

P2

Client IP: A

P1 P1 P3

Server IP: C

SP: 6428

DP: 9157

SP: 9157

DP: 6428

SP: 6428

DP: 5775

SP: 5775

DP: 6428

• Same socket (6428) at server for both clients in this example • DP specifies the process to which data should be delivered at the Receiver • SP specifies the process from which data is coming, for the specified source IP address; acts like a return address for replies/responses if required to be sent back

Page 10: Lecture set 7

Connection-oriented Demultiplexing (TCP)

• TCP socket identified by 4-tuple: – source IP address

– source port number

– destination IP address

– destination port number

• Receiver host uses all four values to direct segment to appropriate socket; socket is uniquely identified by 4-tuple (S-IP, SP, D-IP, DP)

• Server host may support many simultaneous TCP sockets: – each socket identified by its

own 4-tuple

• Web servers have different sockets for each connecting client – non-persistent HTTP will

have different socket for each request

Page 11: Lecture set 7

Connection-oriented Demultiplexing (TCP)

Client IP:B

P1

Client IP: A

P1 P2 P4

Server IP: C

SP: 9157

DP: 80

SP: 9157

DP: 80

P5 P6 P3

D-IP:C

S-IP: A

D-IP:C

S-IP: B

SP: 5775

DP: 80

D-IP:C

S-IP: B

• This is a Web Server example as the segments are being sent to Port 80 of the server which corresponds to the HTTP Service • Note that in this case, the server is creating a separate process for each of the sockets. This would be inefficient (see next slide for a more efficient example with “threading”)

Page 12: Lecture set 7

Connection-oriented Demultiplexing (TCP) Threaded Web Server

Client IP:B

P1

Client IP: A

P1 P2

Server IP: C SP: 9157

DP: 80

SP: 9157

DP: 80

P4 P3

D-IP:C

S-IP: A

D-IP:C

S-IP: B

SP: 5775

DP: 80

D-IP:C

S-IP: B

• This is also a Web Server example as the segments are being sent to Port 80 of the server which corresponds to the HTTP Service • Note that in this case, the server is creating one process for all the sockets. A new thread (kind of like a sub-process) is created for each socket

Page 13: Lecture set 7

UDP: User Datagram Protocol

• “no frills,” “bare bones” Internet transport protocol

• “best effort” service, UDP segments may be:

– lost

– delivered out of order to application

• Connectionless:

– no handshaking between UDP sender, receiver

– each UDP segment handled independently of others

Why have UDP?

• No connection establishment (which can add delay)

• Simple: no connection state information kept at sender or receiver

• Small Segment Header

• No Congestion Control: UDP can transmit as fast as it can

Page 14: Lecture set 7

UDP: User Datagram Protocol

• Commonly used for streaming multimedia applications which tend to be loss tolerant but rate sensitive

• UDP also used for DNS and SNMP

• For reliable transfer over UDP one must add reliability at the level of the application layer, e.g. application-specific error recovery!

Source Port # Dest Port #

32 bits

Application Data

(Message)

UDP Segment Format

Length Checksum

Length, in bytes of

UDP segment, including header

UDP Checksum: Standard Internet Checksum added by the sender. Used by the receiver to check for bit errors. (See next slide)

Page 15: Lecture set 7

UDP Checksum Calculation

• UDP checksum covers pseudoheader followed by UDP datagram

• IP addresses included to detect against misdelivery

• Receiver recalculates the checksum and silently discards the datagram if errors detected (i.e. no error message generated)

• Using UDP checksums is optional but hosts are required to have checksums enabled

0 0 0 0 0 0 0 0 Protocol = 17 UDP Length

Source IP Address

Destination IP Address

0 8 16 31

UDP Pseudoheader (used in checksum calculation but never actually transmitted,

nor is it included in the “Length”)

Note that IP Address information will come from another layer (Network Layer). Strictly speaking, this goes against the philosophy of keeping the layers separate from each other.

Page 16: Lecture set 7

UDP Destination Port Usage

Port 1

Port 2

Port 3

UDP Demultiplexing (based on destination port #)

IP Layer

Arrival of UDP Datagram

Datagram demultiplexed to its appropriate port

Error Message sent back if the Dest. Port # indicated in the datagram does not exist!

Page 17: Lecture set 7

UDP Port Numbers

Well Known Port Numbers

Dynamically Assigned Port Numbers

Universally assigned and accepted port #s providing some designated service. Typically, lower port numbers used for this Examples - 37 Time 53 Domain Name Server 67 DHCP Server 68 DHCP Client

• Ports are not globally known • When a program needs a port, it asks for and gets one from the network software • Destination m/c needs to be queried to find the port number at which it may be offering the service to be accessed • Typically higher port numbers used for this

Page 18: Lecture set 7

TCP: Transmission Control Protocol

• full duplex data:

– bi-directional data flow in same connection

– MSS: maximum segment size

• connection-oriented:

– handshaking (exchange of control msgs) initializes sender, receiver state before data exchange

• flow controlled:

– sender will not overwhelm receiver

• point-to-point:

– one sender, one receiver

• reliable, in-order byte steam:

– no “message boundaries” inside!

• pipelined:

– TCP congestion and flow control set window size

• send & receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

Important to remember though that a TCP stream is unstructured, i.e. no boundary marks in the stream itself so application would have to create such boundary marks if needed (e.g. separating different fields)

Page 19: Lecture set 7

TCP Segment Format

Each TCP segment has header of 20 or more bytes + 0 or more bytes of data

Source Port Destination Port

Sequence Number

Acknowledgment Number

Checksum Urgent Pointer

Options Padding

0 4 10 16 24 31

U R G

A C K

P S H

R S T

S Y N

F I N

Header Length

Reserved Window Size

Data

Head

er

Page 20: Lecture set 7

TCP Header

Port Numbers

• A socket identifies a connection endpoint

– IP address + port

• A connection specified by a socket pair

• Well-known ports

– FTP 20

– Telnet 23

– DNS 53

– HTTP 80

Sequence Number

• Byte count

• First byte in segment

• 32 bits long

• 0 SN 232-1

• Initial sequence number selected during connection setup

Page 21: Lecture set 7

TCP Header

Acknowledgement Number

• SN of next byte expected by receiver

• Acknowledges that all prior bytes in stream have been received correctly

• Valid if ACK flag is set

Header length

• 4 bits

• Length of header in multiples of 32-bit words

• Minimum header length is 20 bytes

• Maximum header length is 60 bytes

Page 22: Lecture set 7

TCP Header

Reserved

• 6 bits

Control

• 6 bits • URG: urgent pointer flag

– Urgent message end = SN + urgent pointer • ACK: ACK packet flag • PSH: override TCP buffering • RST: reset connection

– Upon receipt of RST, connection is terminated and application layer notified

• SYN: establish connection • FIN: close connection

Page 23: Lecture set 7

TCP Header

Window Size

• 16 bits to advertise window size

• Used for flow control

• Sender will accept bytes with SN from ACK to ACK + window

• Maximum window size is 65535 bytes

TCP Checksum

• Internet checksum method

• Computed over

TCP pseudo header + TCP segment (header+ data)

(See next slide for TCP pseudo header)

Page 24: Lecture set 7

TCP Pseudo Header (for checksum calculation)

0 0 0 0 0 0 0 0 Protocol = 6 TCP Segment Length

Source IP address

Destination IP address

0 8 16 31

Used in checksum calculation but never actually transmitted, nor is it included in the “Length”

Usage similar to that of the UDP Pseudoheader

Page 25: Lecture set 7

TCP Header

Options

• Variable length

• NOP (No Operation) option is used to pad TCP header to multiple of 32 bits

• Time stamp option is used for round trip measurements

Options

• Maximum Segment Size (MSS) option specifices largest segment a receiver wants to receive

• Window Scale option increases TCP window from 16 to 32 bits

Page 26: Lecture set 7

TCP Services

• Provides a full duplex connection-oriented and reliable byte-stream service using a sliding-window flow control.

• User data are broken into segments not exceeding 64 kbytes

(usually about 1500 bytes) and sent to the destination by encapsulating them in IP datagrams – IP provides unreliable packet delivery – packets can get lost, duplicated or delivered out of

sequence

• Receiver sends an acknowledgment back after receiving a segment.

• Retransmission of segment if necessary

Page 27: Lecture set 7

TCP Services

buffer

segments buffer used

Application

Transport

advertised window size < B

buffer available = B

Application

buffer

segments

buffer

Application

Transport

ACKS RTT Estimation

Application

Page 28: Lecture set 7

TCP Round Trip Time and Timeout

How to set TCP timeout value?

• Must be set longer than the RTT, but the RTT also varies

• If it is set too short, the premature timeout may happen leading to unnecessary retransmissions

• If it is set too long then the response to segment loss will be too slow.

Page 29: Lecture set 7

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average

influence of past sample decreases exponentially fast

typical value: = 0.125

How is the RTT estimated?

• SampleRTT = measured time from segment transmission until ACK for that is received, ignoring retransmissions

• SampleRTT will fluctuate but we want the estimated RTT to be “smoother”. This is done by taking a moving average EstimatedRTT over recent measurements – should not just use the current SampleRTT

Page 30: Lecture set 7

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

Example Measurments (SampleRTT and EstimatedRTT)

Page 31: Lecture set 7

TCP Round Trip Time and Timeout

Setting the timeout

• EstimtedRTT plus “safety margin”

– large variation in EstimatedRTT -> larger safety margin

• First estimate of how much SampleRTT deviates from EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT +

*|SampleRTT-EstimatedRTT|

(typically, = 0.25)

• Then set the timeout interval as -

Page 32: Lecture set 7

TCP Connection Establishment

• Three-Way Handshake

– A sends a SYN segment specifying the port number of the other party B , the initial sequence number (ISN) that A will use and other info (eg. max. segment size)

– B responds with its own SYN segment containing its ISN. B also acknowledges A’s SYN by ACKing A’s ISN plus one

– A acknowledges B’s SYN by ACKing B’s ISN plus one

• Initial Sequence Number (ISN) may be randomly chosen but with some important considerations

Page 33: Lecture set 7

Initial Sequence Number (ISN)

• Select initial sequence numbers (ISN) to protect against segments from prior connections (that may circulate in the network and arrive at a much later time)

• Select ISN to avoid overlap with sequence numbers of prior connections

• Use local clock to select ISN sequence number

• Time for clock to go through a full cycle should be greater than the maximum lifetime of a segment (MSL); Typically MSL=120 seconds

• High bandwidth connections pose a problem

Page 34: Lecture set 7

Three Way Handshake (TCP Connection Setup)

Host A Host B

Protects the ISN against responding falsely to old segments from prior connections

Page 35: Lecture set 7

Maximum Segment Size

• Maximum Segment Size (MSS) - largest block of data that TCP sends to other end

• Each end can announce its MSS during connection establishment

• Default is 576 bytes including 20 bytes for IP header and 20 bytes for TCP header

• Slight difference between the MSS of Ethernet and IEEE 802.3.

Ethernet MSS = 1460 bytes

IEEE 802.3 MSS = 1452 bytes

Page 36: Lecture set 7

TCP Window Flow Control

Host A Host B

t1

t2

t3

t4

t0

Win =Advertised Window size

128 bytes to transmit

Only 512 bytes sent as that is the advertised value of Win

1024 bytes to transmit

1024 bytes to transmit

1024 bytes to transmit

1024 bytes to transmit

Page 37: Lecture set 7

Nagle Algorithm

• Situation: User types one character at a time

– Transmitter sends TCP segment per character (41B) – Receiver sends ACK (40B) – Receiver echoes received character (41B) – Transmitter ACKs echo (40 B) – 162 bytes transmitted to transfer one character! Problem!

• Solution:

– TCP sends data & waits for ACK – New characters buffered – Send new characters when ACK arrives – Algorithm adjusts to RTT as follows -

• Short RTT send frequently at low efficiency • Long RTT send less frequently at greater efficiency

Page 38: Lecture set 7

Silly Window Syndrome

• Situation:

– Transmitter sends large amount of data

– Receiver’s buffer is depleted slowly, so buffer fills up

– Every time a few bytes read from buffer, a new advertisement to transmitter is generated

– Sender immediately sends data & fills buffer

– This leads to many small, inefficient segments being transmitted

• Solution:

– Receiver does not advertize window until window is at least ½ of receiver buffer or is equal to the maximum segment size (MSS)

– Transmitter refrains from sending small segments

Page 39: Lecture set 7

Sequence Number Wraparound (Potential problem at high data rates)

• 232 = 4.29x109 bytes = 34.3x109 bits (TCP has 32-bit seq. no.) Therefore, at 1 Gbps, sequence numbers will wraparound in just 34.3

seconds transmitter can only transmit for very brief periods Solution: Use Timestamp Option in TCP option field. Transmitter inserts 32-byte timestamp in transmitted segment. Receiver echoes this in ACK. This option must be requested in the SYN segment and is negotiated during the Connection Setup.

– Timestamp + sequence no → 64-bit seq. no (effectively a much larger sequence number than the original 32-bit)

– Timestamp clock must: • Tick forward at least once every 231 bits • Not complete cycle in less than one MSL • Example: clock tick every 1 ms @ 8 Tbps wraps around in 25

days

Page 40: Lecture set 7

Delay-BW Product & Advertised Window Size

• Suppose RTT=100 ms, R=2.4 Gbps then –

No. of bits in pipe = 3 Mbytes

• If a single TCP process occupies the pipe, then required advertised window size is RTT x Bit rate = 3 Mbytes

• But, normal maximum window size is only 65535 bytes which clearly is inadequately small

• Solution: Use the “Window Scale Option” which will allow the window to be scaled upward by a factor of 214 . Then a Window Size up to 65535 x 214 = 1 Gbyte will be allowed. This window scaling option must be requested in the SYN segment and is negotiated during the Connection Setup.

Page 41: Lecture set 7

(Graceful Close)

Host B still delivers 150 bytes

Host A Host B

Closing a TCP Connection

Host A initiates the TCP connection termination, sends its FIN Host B sends ACK

but does not yet send its own FIN

Host B now sends its own FIN

Host A ACKs B’s FIN closing its side of the connection

Host B gets A’s ACK and closes its side of the connection

After sending FIN, Host A cannot send any more data but cannot close the connection as B may still be sending something

Page 42: Lecture set 7

TIME_WAIT state

TIME_WAIT State is entered if the host sending a FIN (e.g. Host A in previous slide) receives an ACK from the other side

This protects future incarnations of connection from delayed segments

TIME_WAIT = 2 x MSL Maximum Segment Lifetime (MSL) is the maximum time that an IP packet packet can live in the network

Only valid segment that can arrive while in TIME_WAIT state is a FIN retransmission. If such segment arrives, resent ACK & restart TIME_WAIT timer

When timer expires, close TCP connection

Page 43: Lecture set 7

TCP State Transition Diagram

CLOSED

LISTEN

SYN_RCVD

ESTABLISHED

CLOSING

TIME_WAIT

SYN_SENT

FIN_WAIT_1

CLOSE_WAIT

LAST_ACK

FIN_WAIT_2

passive open,

create TCB

application

close,

send

FIN

application close

or timeout,

delete TCB

2MSL timeout

delete TCB

receive SYN,

send ACK

Appli-

cation

close

Page 44: Lecture set 7

Congestion Control in TCP

• Advertised window size is used to ensure that receiver’s buffer will not overflow

• However, buffers at intermediate routers between source and destination may still overflow (i.e. because of network congestion)

Router

R bps

Pack

et

flow

s m

ay

com

e in

from

man

y so

urce

s

Congestion occurs when total arrival rate from all packet flows exceeds R over a sustained period of time

When congestion occurs, buffers at routers will fill and packets will be lost

Page 45: Lecture set 7

Different Phases of Congestion Behavior

1. Light traffic – Arrival Rate << R – Low delay – Can accommodate more

2. Knee (congestion onset) – Arrival rate approaches R – Delay increases rapidly – Throughput begins to

saturate

3. Congestion collapse – Arrival rate > R – Large delays, packet loss – Useful application

throughput drops

Thro

ughpu

t (b

ps)

Dela

y (s

ec)

R

R

Arrival Rate

Arrival Rate

Page 46: Lecture set 7

Window Congestion Control

• (From previous slide) Desired operating point will be just before knee as shown there. Sources must control their sending rates so that aggregate arrival rate is just before knee

• TCP sender maintains a congestion window “cwnd” to control congestion at intermediate routers

• Effective window is minimum of congestion window and advertised window

• Problem: The source does not know what its “fair” share of available bandwidth should be and so does not know what value to set for cwnd

• Solution: Adjust cwnd dynamically to available BW as follows

– Sources probe the network by gradually increasing cwnd (Initially set cwnd to a low value)

– When congestion detected, sources reduce rate – Ideally, sources’ sending rate will stabilize near ideal point

Page 47: Lecture set 7

Congestion Window

How does the TCP congestion algorithm change congestion window dynamically according to the most up-to-date state of the network?

• At light traffic: each segment is ACKed quickly

– Increase cwnd aggresively

• At knee: segment ACKs arrive, but more slowly

– Slow down increase in cwnd

• At congestion: segments encounter large delays (so retransmission timeouts occur); segments get dropped in router buffers

– Reduce transmission rate, then probe again

Page 48: Lecture set 7

TCP Congestion Control: Slow Start

Slow Start: Increase congestion window size by one segment upon receiving an ACK from receiver

– initialized at 2 segments – used at (re)start of data transfer – congestion window increases exponentially

ACK

Seg

RTTs 1 2

4

8

cwnd

Page 49: Lecture set 7

TCP Congestion Control: Congestion Avoidance

• Algorithm progressively sets a congestion threshold – When cwnd > threshold,

slow down rate at which cwnd is increased

• Increase congestion window size by one segment per round-trip-time (RTT) – Each time an ACK

arrives, cwnd is increased by 1/cwnd

– In one RTT, cwnd segments are sent, so total increase in cwnd is cwnd x 1/cwnd = 1

– cwnd grows linearly with time

RTTs

1

2

4

8

cwnd

threshold

Page 50: Lecture set 7

TCP Congestion Control: Congestion

• Congestion is detected upon timeout or receipt of duplicate ACKs

• Assume current cwnd corresponds to available bandwidth

• Adjust congestion threshold = ½ x current cwnd

• Reset cwnd to 1

• Go back to slow-start

• Over several cycles expect to converge to congestion threshold equal to about ½ the available bandwidth

Con

gest

ion

win

dow

10

5

15

20

0

Round-trip times

Slow start

Congestion Avoidance

Time-out

Threshold

Page 51: Lecture set 7

Fast Retransmit & Fast Recovery

• Congestion causes many segments to be dropped

• If only a single segment is dropped, then subsequent segments trigger duplicate ACKs before timeout (as shown)

• Can avoid large decrease in cwnd as follows:

– When three duplicate ACKs arrive, retransmit lost segment immediately

– Reset congestion threshold to ½ cwnd – Reset cwnd to congestion threshold + 3

to account for the three segments that triggered duplicate ACKs

– Remain in congestion avoidance phase – However if timeout expires, reset cwnd

to 1 – In absence of timeouts, cwnd will

oscillate around optimal value

SN=1 ACK=2

ACK=2 ACK=2 ACK=2

SN=2

SN=3 SN=4 SN=5

Page 52: Lecture set 7

TCP Congestion Control: Fast Retransmit & Fast Recovery

Con

gest

ion

win

dow

10

5

15

20

0

Time (in units of RTT)

Slow start

Congestion avoidance

Time-out

Threshold