ppppp

Connection Management in Transport Protocols Carl A. Sunshine Information Sciences Department, The Rand Corporation, .1700 Main Street, Santa Monica, California 90406, USA

Yogen K. Dalai Xerox Systems Development Department, 3408 Hillview Avenue, Pal6. Alto, California 94304, USA

Transport protocols are designed to provide t'uJ.]y reliable communication between processes which must communicate over a less reliable medium such as a packet switching network (which may damage, lose, or duplicate packets, or deliver them out of order). This is typically accomplished by assigning a sequence number and checksum to each packet transmitted, and retransmitting any packets not positively acknowledged by the other side. The use of such mechanisms requires the maintenance of state informatiol: describing the progress of data exchange. The initialization and ~,~ahltenance of *his state information constitutes a connection between the two processes, provided by the transport protocol programs on each side of the connection. Since a connection requires significant resources, it is desirable to maintain a connection only while processes are communicating. This requires mechanisms for opening a connection when needed, and for closing a connection after ensuring that all user data have been properly exchanged. These connection management procedures form the main subject of this paper. Mecha- nisms for establishing connections, terminating connections, recovering from crashes or failures of either side, and for resynchronizing a connection are presented. Connection management functions are intimately involved in protocol reliability, and if not designed properly may result in deadlocks or old data being erroneously delivered in place of current data. Som~ protocol modeling techniques useful in analyzing cennecrion management are discussed, using verification of connection establishment as an example. The paper is based on experience with the Transmission Control Protocol (TCP), and examples throughout the ~aper are taken from TCP.

Keywords: Computer network, host-to.host protocol, transport protocols, interprocess communication, connections, connection management, reliability, synch ronization, 3 way handshake, resynchronization, correctness, verification.

North-HoUand Publishing Company Computer.Networks 2 (1978) 454-473

1. Introduction

Distributed computing systems, consisting of computers connected to one another by a communication network, require processes to interact by means of explicit interprocess communication protocols. These protocols are often composed of several layers, with higher layers making use of the functions performed by lower layers. A typical protocol architecture consists of a partially reliable 'best effort" communication service at the lowest level, followed by a general purpose fully reliable transport protocol, and finally various higher level protocols that provide specialized services such as file transfer, remote job entry, interactive terminal support, graphics, and resource coordination. Each protocol layer provides an augmented service by using appropriate mechanisms on top of the lower level protocol.

This paper focuses on transport protocols (TP) for use ori packet switching networks which may damage, duplicate, lose, and delay packets, or deliver them in a different sequence than submitted. The basic technique used by transport protocols to achieve reliable

Carl Sunshine received a PhD in computer science from Stanford Utliver- sity in 1975 where he worked on analysis, design, and implementation of communication pratocols for computer networks. Since 1975 he has been with the Rand Corporation, where he is involved in research on computer network protocols, network interconnectioa network plan- ning, distributed systems, and operating systems. Dr. Sun~;hine is active in

IFIP TC6.1, the Internetwork Working Group.,

Yogen K. Dalai received his BTech degree in Electrical Engineering from the Indiap Institute of Technology, Bombay, in 1972, and his MS and PhD degrees in Electrical Engineering from Stanford Univ,~rsity in ~973 and 1977 respectively. He is cur- rently with the Xerox Systems Devel- opment Department in Palo Alto working on the design, analysis and implementation of computer communication protocols, and the inter-

connection of computer networks. His research interests also include local networks, distributed system~ architecture, packet switching~ broadcast protocols and operating systems. Dr. Dalai is a member of the ACM and IEEE.

454

CA. Sunshine, Y.K. Dalai / Connection management in transport protocols 45.5

communication over such an unreliable transmission medium is to assign a sequence number and checksum to each packet transmitted, to verify the checksum and positively acknowledge successfully received packets, and to retransmit packets that remain unacknowledged beyond a timeout period. Further discussion on the motivation for and use of these techniques in transport protocols may be found in refs. [4,17,22]. Transport protocols can also perform a multiplexing function to allow many processes in a host to share the network access path.

The salient point for this paper is that the use of these mechanisms requires the maintenance of state information describing the progress of data exchange. The initialization and maintenance of this state information constitutes a connection between the two processes, provided by the protocol machines on each side of the connection. Connection oriented protocols typically include an initialization phase during which necessary parameters (e.g. sequence numbers) are synchronized at each end, a data transfer phase, and a termination phase. Such connection oriented or virtual circuit protocols are in contrast to message oriented interprocess communication protocols [25] which are inherently unreliable.

Although a connection potentially exists between every pair of processes, only some processes will need to converse at any given time. Since a connection requires significant resources, it is desirable to main-

tain a connection only while processes are communicating, and then to terminate the connection and free the resources when the processes are done. This leads to mechanisms for opening a connection when needed, and for closing a connection after insuring that all user data have been properly exchanged.

In opening and then closing a connection, it is convenient to describe the protocol machines as going through a number of states. Each machine starts in the NoMctive state in which no connection exists. On command from the user process, a machine may actively initiate connection establishment (Opening state), or passively wait for connection establishment to be initiated from the remote end (Listening state). Once the connection is established (Established state), either user may terminate it, placing the protocol machine in a Closing state. Once the connection has been closed, the protocol machine is in the NotActive state again. Figure 1 illustrates the life- cycle of a connection as it passes from one: state to another. The Opening and Closing states may in reality consist of several substates depending on the protocol used for accomplishing the desired effects.

These connection management procedures fo~m the main subject of this paper. They include procedures for dealing ~ith crashes or failures of either side of a connection as well as opening and dosing connections under normal circuinstances. This paper does not discuss the multiplexing function of

OPEN

% /

Opening

packet exchange

LISTEN

, ' , , . L~

.... L istening

packet exchange

l NotActive t .,

~I Established I . . . . l

Fig. 1. Comlcction life-cycle.

packet exchange

Closing

CLOSE

456 CA. Sunshine, Y.K. Dalai / Connection management #~ transport protocols

transport protocols which is independent of connection management functions for individual connections. Connection management functions are intimately involved in protocol reliabiiity, and if not designed properly may result in deadlocks or old data being erroneously accepted in place of current data.

This paper is based on our experience with the Transmission Control Program (TCP) [5,7,8]. Our efforts in designing TCP have resulted in continuing changes to the original lnternetwork Protocol described by Cerf and Kahn [4]. We will use TCP to designate the class of transport protocols based on the Intemetwork Protocol. Lessons are drawn from various stages of TCP development throughout the paper, so the reader is cautioned that some examples do not reflect the current TCP. In particular, the formal analysis in Section 6 is based on an early version of TCP. We will use TP and TCP to stand for both the protocol and the program that implements it.

TCP wzs designed with strong "worst case" assumptions about the underlying transmission medium. In particular it was assumed that packets could be damaged, lost, duplicated, and delivered out of order, with a widely varying delay. All of these events are known to occur through various "natural" causes in packet switching networks, ltowever, TCP was not designed to solve the additional problems imposed by "malicious intruders" [15] which require procedures involving encryption. In considering connection management mechanisms within a TP, it will be helpful to use examples of the dialogue between two TPs. All of our examples will be drawn from TCP, but similar scenarios exist in other transport protocols. These dialogues consist of the exchange of packets between TCPs A and B and will be illustrated as follows:

Each packet contains a sequence number, some optional control information, an optional acknow. ledgement, and some optional data, represented in the following notation:

( Seq #)(Control )( Ack #)(data ).

Each line of the dialogue consists of a packet label in pareatheses followed by the activity at A where "" sign,.'fies ~he packet being transmitted at A, " ..... " signifies that A is unaware of the packet at that time, and 'T ' indicates no activity. Next appears a descrip. tion of the packet in the notation described earlier. Lastly, the activity at process B is described, where

"- ->" signifies that the packet is received at B, "

.A. Sunshine, Y.K. Dalai/Connection management in transport protocols 457

(1)

(2) (3) (3) 2 (4)

(5)

(6) (7) (3) ~ (8)

A B

OPEN LISTEN Opening Listening - -> ( Seq 0 )(data ABe ) - .> I accept ( Seq 3 )(data DE ) - ->

accept

458 C.A. Sunshine, Y.K. Dalai / Connection management in transport protocols

incarnation number[13,22] which is derived from some global counter. If the counter has cycle time greater than L, no confusion is possible. However, another field on every packet sent is required, increas- hag overhead. The address field may also serve as a unique identifier if a new port address is used by a process each time it opens a connection [4,19]. Some port addresses may be reused to facilitate addressing "well-known" services, but two reusable ports can never be connected, guaranteeing that the pair of port addresses will be unique for every connection.

If the ISN is used to distinguish packets from different connections, it must be carefully selected based on memory of previous connection sequence numbers, or based on a clock. In the memory approach, the ISN is set to the last sequence number used in the previous connection, plus one. This requires maintaining state infomlation for inactive connections because the last sequence number used must be remembered for time L on every conncction. Once time L has passed, any value for ISN may be used. In the clock approach, the ISN is set from a single clock for all connections at a host. The clock value is the only state information that must be preserved through inactive connections and host crashes. However, use of a cyclic clock requires resetting the sequence number if the clock is about to "catch up" with the sequence number as described in Section 3. The process of resetting the sequence number is called resynchronization. An additional cost of this mechanism is testing to see if it is time to resynchronize the connection.

In general, all solutions of the second type may fail if the state information which distinguished connectior.~s is lost. In this case the TP must resort to a solution of the first type and wait time L before initiating new connections. To reduce the likelihood of failure, the state information can be reduced to a minimum and maintained by some specially reliable mechanism like an independent clock or counter.

A combination of mechanislns is also possible. For example, Garlick et. al. [13] propose remembering sequence numbers for a time L after closing connections, plus a clock-based incarnation number that is changed only after crashes and thus avoids the need to wait before restarting.

2.2. Validation of connection requests

The techniques for ISN selection described above define procedures for the sending portion of TP that will allow the receiving modules to correctly reject old packets. Once a connection is established, the ESN, in combination with the incarnation number or unique port address pair if they are used, allow the receiver to tell old packets from new ones. The maj function of conn,~ction establishment is to set ES= and any additional connection identifiers in the receiving modules to match ISN values used by the sending modules on the other side of the connection.

To accomplish this, each TP may try to maintain enough state information about dosed connections to set its own ESN and recognize old packets. This would require remembering sequence numbers and/or

A

OPEN Opening pick ISN = x

(I) - -> ( Seq x )( SYN ) I

(2) Established

< Seq y )(SYN )

CA. Sunshine, Y.K. Dalal ] Connection management in transport protocols 459

incarnation numbers for every connection for a time L [12], and may again impose an unacceptable burden. In the vent of failure with loss of memory, no connections may be accepted for time L.

Alternatively, the sending modules responsible for selecting ISN may also infon~ the remote TP of the ISN they will use for each new connection. The sending modules transmit a Synchronization control packet (SYN) containing the ISN value as the first packet on the connection (Figure 3). The SYN packet is assigned a sequence number and must be acknowledged just as a data packet to ensure its reliable transmission. The receiving module can set ESN to this value without maintianing any state information about the connection. The receiving TP returns a SYN giving its own ISN, or can reject any SYN that arrives when the protocol machine is in an inappro- priate state. Inappropriately timed arrivals are either old retransmissions, protocol errors, or attempts to establish a conversation with an unwilling partner.

Unfortunately, this simple system of a credulous TP is inadequate when packets may arrive out-of- order. Once the connecO~3n is established, sequence numbers serve to validate incoming packets. But there is no mechanism for validating an arriving SYN control packet. Suppose a SYN packet is delayed in the network and retransmitted by A. It may arrive at B just at the moment when B is ready to establish a raew connection (Figure 4). B will accept the SYN as

new connection request, reply with its own SYN, and consider the connection established. A will

receive the replying SYN and interpret it as a new connection request. A's attempt to reply will be discarded by B who thinks the connection is already established, and a deadlock will occur which prevents successful data transfer.

2.3 Three way handshake

To avoid this problem, a more reliable means of transmitting the current ISN to a remote TP must be used. Tomlinson [24] has presented such a scheme called the 3 way handshake which is used in TCP. Instead of simply accepting an arriving SYN, the receiving TP must ask the sending TP to verify the SYN as current. The receiving TP returns a SYN-ACK control packet to the sending TP which refers to the ISN from the SYN (Figure 5). If the SYN was a current packet, the sender returns a positive acknowledgement, and only then does the receiver accept the SYN and set ESN. This synchronization must occur in both directions, with the SYN-ACK also carrying ISN of the receiver in the other direction. Any packet of the 3 way handshake may also carry data which will be processed after the connection is established, but this is omitted from the figures for simplicity.

If B receives an old SYN packet, it returns a SYN- ACK referencing the ISN in the old packet (Figure 6). A returns a negative acknowledgement (Reset packet) since it is not trying to initiate a connection. Upon receiving the Reset (RST), B knows that the original SYN was an old packet, so B returns to the Opening

A

Listening (Seq z)(SYN ) (1 ) .....

I (2) (Seq x )(SYN)

I (6)

460 CA. Sunshine, Y.K. Dalai / Connection mat~agement in transport protocols

(1)

(2)

(3)

A

OPEN Opening

pick ISN = x - -~>

!

I

set ESN = y + 1 Established I

- -~ ,

I I

(Seq.0)(SYN)

t,

(Seq y)(SYN)(Ack x + 1 )

(Seq x + 1 )(Ack y + 1 )

B

LISTEN I,istening I

remetnber x pick ISN = y

I I

- ->

set ESN = x + 1 Established

Fig. 5. Three way handshake connection cstablishm~.nt.

wait for verification

or Listening state. A similar scenario takes care of old SYN-ACK packets. The use of RST packets is described in greater detail in section 5 on recovering from crashes.

The basic 3 way handshake mechanism for establishing connections is inherently asymmetric, with one side initiating the attempt by sending a SYN, and the other side waiting to respond to a SYN from the active side. Occasionally a pair of processes may simultaneously attempt to initate a ccnnection to each other, in which case a "collision" is said to occur. Early versions of TCP simply gwe up and retried after a random delay to resolve such collisions. More recent versions of TCP will accept a SYN packet after having sent their own SYN by replying with an ACK to successfully establish the connection. How- ever, if two different SYN packets (one an old duplicate) arrive while a connection is being established, the approach of retrying after a delay may still have to be used.

The 3 way handshake mechanism is adequate to reliably establish connections under the worst case assumptions of packet switching network transmission characteristics without requiring any memory of previous connections to set ESN. It also accom- modates failure recovery as discussed in Section 5. Techniques for verifying these mechanisms more rigorously than the informal discussion in this section has allowed are presented in Section 6.

Incarnation numbers are essentially equivalent to sequence numbers for purposes of connection establishment, and the 3 way handshake may also be used to reliably exchange incarnation numbers [ 13]. Other mechanisms requiring memory of previous activity by ~he receiving modules [12], or use of unique port addresses [19] may provide the required reliability without a 3 way handshake. If end-to-end encryption is hlcorporated into the transport protocol, then the initialization of a new encryption key for each connection serves a similar role to the 3 way hand-

A

Not active (1) ..... (Seq z)(SYN)

I I

(2) I Not active

(Seq y)(SYN)(AcK z ~ 1 )

( Seq z + 1 )(RST )( Ack y + l )

B

Listening

remember z pick ISN --- y

I

reject SYN Listening

Fig. 6. Rejections of old SYN packet with 3 way handshake.

old connection request

wait for verification

CA. Sunshine, Y.K. Dalai / Connection management in transport protocols 461

shake in guaranteeing that old packets will be rejected [151.

3. Clock driven ISN and resynchronization

A basic goal of the ISN selection mechanism is to prevent packets from being emitted with sequence numbers which duplicate those that are atill in the multinetwork. This should be assured even when TPs crash and lose all knowledge of the sequence numbers they have been using. In TCP, when new connections are created, an ISN generator is employed which selects a new 32 bit ISN. The generator is bound to a (possibly fictitious) clock that is assumed to keep running even if the TCP or its supporting host crashes. We now describe the implementation of such a clock driTen scheme, and the necessary relationships between the clock tick, the maximum packet lifetime L, the size of the sequence number, the transmission rate, and the notion of resynehronization [6,9].

Refer to Fig. 7 for a pictorial representation of the analysis to follow. Let D = duration of the clock tick in seconds, C = period of the clock in seconds, L = maximum lifetime of packets in the multinetwork in seconds, R = time in seconds until resynchronization is necessary under average bandwidth b, b = average bandwidth of transndssion in octets/see, B = maximum average bandwidth in octets/see, S = length of

sequence numbers in bits, p = length of clock time in bits (p < S), q = S - p.

The figure illustrates a curve, the ISN curve, representing a clock ticking every D units of time, and wrapping around in a time C. The ISN + 1 curve is a curve similar to the ISN curve, but displaced by D, and the .forbidden zone boundary is another curve similar to the ISN curve, but displaced from it by a time L +D. Sequence numbers are S bits long. (This theory and analysis holds for any radix system, we have chosen binary only for convenience.)When a connection is to be established, a unique ISN is picked, and from then on sequence numbers aie incremented as and when dzta is transmitted. In the proposed scheme [24], when a connection is to be established, the clock is allowed to tick at least once. and then the ISN is a number which has the current value of the clock in the high order p bits and zeros in the low order q bits of S (i.e. the ISN is picked from the ISN curve). The actual rate b at which data is transmitted, and sequence numbers assigned are illustrated by the curve marked actual t~z: ; , iss i6, rate. The connection must be resynchronized when the actual transmission rate curve touches the forbidden zone boundary. Otherwise should a connection be closed and reopened within the forbidden zone, then old duplicate packets might be accepted by the receiver, as follows (Figure 8).

Suppose that at time tb, sequence numbers Sa

2 S

t seq # s

_ _LZ

Forbidden Zone Boundary ISN +1 Curve ISN Curve

Forbidden Zone

~~:~' i~ i ! res y nchroniz

!i .i! . . - - "~4!~: : "%I i

....----" ~'::i'.:. ~'.: ~' ;~ . . . . . . ~.i~i I~: ? ~ "":

~'~!."~ -" actual ~ I : ' , ! i l I F J I ='.':L, ." " ~:::H .r..,: ...'.. ,'.. ~. i . " - ransmiss ion rate i ~ t ~ ~ ,.--I" ,, , !~; ! ! _ : ! I

_ I

~!! ! ,.

time t )

Fig. 7. Tile forbidden zone and ISN curve.


seq # s sl~ Sa

actual ~ransmission

time t )

L ~....,...[ ......

t b t 1

I

i!0

i . . . . . . ' . . . . . . . . . . . . i i

. . . . . . . . . . . . i

i

I

m S 1

s O

I I

Fig. 8. The need to resynchronize.

Forbidden Zone Boundary ISN +t, Curve ISN Curve

through sb have been used, at which point the connection is closed. A new connection is started at time to using ISN So derived from the clock. But packets carrying sequence numbers Sa through st, can persist in the net for L seconds, and tile new connection can u~ sequence numbers So through s~ during time to to t~. Since this includes Sa to st,, the old duplicates can be confused with new packets. The same arguments hold for the case where the transmitting TCP crashed while generating packets with sequence numbers ha the forbidden zone, and then hnmediately restarted.

This qxample shows that the TCP must not be allowed ~o generate packets with sequence numbers in the forbidden zone. This can happen (1) if the clock catches up with the TCP, thus causing entry into the forbidden zone from the left, or (2) if the TCP transmits too fast and enters the forbidden zone fi'om thebottom. Ti,ereIbre two tests are required:

I) ls the sequence number of the packet about to be trans~;)itted (NSN) ha the forbidden zone to the right?

2) Will the sequence numbers assigned to the data in the entire packet lie outside the forbidden zone? (i.e. Will any of tile sequence numbers used exceed the maximum given by the ISN+I curve at this instant?)

If the first condition is true, then the NSN at the transmitting TCP must be resynchronized in order to permit continued transmission without assigning sequence numbers in the forbidden zone. A protocol by which this can be achieved can be fotmd in [7].

Since the protocol involves the exchange of control packets the forbidden zone boundary is made wider to 2!'ow the exchange to be completed before the real forbidden zone is catered. If the second condition is t, e, then the TCP must either delay transmitting the packet, or must transmit less data in the packet~ If both conditions are false then the packet may be transmitted.

Tile test for re~ynchronization need only be made when a packet is to be transmitted. Hence, if a process does not transmit anything while the forbidden zone moves past its NSN then no resynchreni- zation is necessary. However, if a sender is inactive for a period of time and then tries to transmit while in the forbidden zone, the sender must be prohibited from sending until the forbidden zone had been crossed. Hence, some implemelatations may decide to periodically cheek the need to! resynchronize connections, or to perfonn resynchrc!nization at fixed inter- vals such as C/2 to avoid such d!ifticulties.

An exact formulation of ruiles (1) and (2) in terms of the clock parameters is pres,lmted in the Appendix. It is also shown that the time until resynchronization is needed on a connection ranli~es from C-L for smaU b, to infinity as b approaches B (if data transmission proceeds at the clock rate, the clock never catches up with packet sequence numberl0. A set of reasonable parameters for S= 32 bits ai:ld L = 30 seconds is presented, allowhlg a maxrmum bandwidth of MBits/sec with resynchronizal!ion required at most every 4.5 hours.

C.A. Sunshine, Y.K. Dalai / Connection management in transport protocols 463

4. Connection termination

Once a connection between two communicating processes has become established, reliable communication can take place over it. Eventually one or both of the processes will decide that the connection should be closed because there is no more to say. There are a number of different ways by which an established connection may be closed.

4. I. Using a higher level protocol

The simplest way to close a connection is for both processes to have decided, using a higher level protocol, that they are going to stop communicating and then to inform their local TP that the present connection should no longer exist by giving the Abort command. The TP would then sinlply remove knowledge of that connection from its local tables.

4.2. Using TP supplied mechanisms

Alternatively, the connection management aspect of the TP could provide a mechanism by which one of the processes can inform the other that communi- catien over the connection is to cease. A special control packet indicating that that conversation is finished (FIN) is used achieve this. A FIN is assigned a sequence number for reliability, and travels in a packet with no accompanying data, or control. There are three cases of connection closing as seen by a TP: 1) A user process initiates connection termination by

telling its TP to close the connection. 2) The ,emote TP initiates term,nat,eli (on request

from the remote process) by sending a FIN packet. 3) Both users initiate termination simultaneously by

telling their respective TPs t9 close the connection. There are a number of different protocols by which FIN packets may be exchanged. These protocols ~aust

accommodate all three cases described above. We now describe some possible protocols and indicate their suitability for various purposes.

4.3. Simple FIN exchange

In the simplest case, the user would like to terminate communication immediately without caring about the state of previously transmitted data which may as yet be unacknowledged, or data as yet not received. Instead of just aborting the connection, the TPs could conclude that the connection has closed when each has transmitted and received a FIN command. The reason for requiring receipt of a FIN in addition to transmitting one, is that the receipt of tile FIN indicates that the other end has noticed that the connection should be terminated and will proceed to do so. The initiator of the termination has assur- ance that the remote end will not continue computing and/or charging the user as in a timesharing system. If a FIN is not received in response to one sent within a timeout, then the connection is closed none the less.

When one of the user processes decides that the connection should be closed it tells its h)cal TP to close the connection. The TP sends a FIN to the other TP with the usual next sequence number and current acknowledgement. Upon receipt of an unsolicited FIN packet with an acceptable sequence number, the TP informs its local user that the connection is closing and replies with a FIN packet of its own (Figure 9). A TP that has both transmitted and received a FIN packet may destroy all knowledge of the connection. If both TPs transmit a FIN simultaneously, then they will interpret the other's as a response to their own.

This simple scheme is not very reliable. The data transmitted prior to the transmission of FINs is not guranteed to be delivered (data in packet 1 of Figure

A

(1) l.:stablislaed ,1 . . .

CLOSE Closing

(2) - -> I

(3) ( Ack 29 >(data HI )

( Seq 29 )( FIN >( Ack 78 >

( Seq 80 >( FIN >HAck 30 >

B

Established

I I

Closing

Not active

data in transit

A requests termination

B complies

Fig. 9. Connection termination using simple FIN exchange.

464 C.A. Sunshine, Y.K. Dalai [ Connection management in transport protocols

9). I f full delivery is important, the high level protocol must wait to close the connection until all data has been successfully transmitted. More seriously, if the initiating TP times out it can not be certain whether the other side has received the FIN and the reply was lost, or whether the other side never received the FIN and is still open.

4.4. Acknowledged FIN exchange

To guarantee that the connection will be closed on both sides (without necessarily guaranteeing delivery of data in transit), a protocol requiring acknowledgement of FINs may be used. Each TP transmits a FIN packet and waits for it to be acknowledged by the other end, in addition to waiting for a FIN from the other end. This scheme is very much like the 3 way handshake used to exchange SYNs reliably.

In the first of the three cases mentioned earlier, when the user process closes the connection, the TP transmits a FIN packet and will not accept any more data from the user process. All packets preceeding and including the FIN will be retransmitted until acknowledged. When the remote TP has both acknowledged the FIN and sent a FIN of its own, the first TCP can acknowledge the remote FIN and delete knowledge of this connection from its state tables.

In the second case, a TP receives an unsolicited FIN from the remote TP. The receiving TP then acknowledges the FIN and sends back one of its own even if intervening data have not been received. (Data packet 1 in Figure 10 is never delivered.). It also informs its user of the remote close request, and does not accep~ any more data to send. The TP waits for an acknowledgement of its FIN packet before it can delete knowledge of the connection (Figure 10). If an acknowledgement is not forthcoming, after a timeout the connection is deleted none the less.

In the third case, a simultaneous close by both processes will cause FIN packets to be exchanged. Each TP acknowledges the FIN it received. Both TPs upon receiving these eacknowledgements can delete the connection (Figure 11). In the event that the acknowledgements are lost, one or both TPs will conti~iue retransmitting the FIN until they timeout.

By requiring FINs to be acknowledged, both sides can be sure that the connection will close, once the initiating FIN has been received. In particular, if the replying TP times out it may be that the replying FIN never reached the initiating TP, or that the final acknowledgement was lost as in Figure 12. In the first case, both TPs will timeout, and in the second, the initiating TP closes normally and the replying TP times out. In either case both sides will close the connection.

4.5. Graceful acknowledged FIN exchange

Users may in addition desire that the data in tile pipe during connection termination be reliably delivered. Close is an operation meaning I have no more data to send. The notion of closing a full duplex connection is subject to ambiguous interpretation, since it may not be obvious how to treat the receiving side to the connection. The TP interprets close in a half duplex fashion. The user who closes may continue to receive data until it is told that the other end has closed too. The TP will reliably deliver all data sent prior to closing the connection, so a user that expects no data in return need only wait to hear that the connection was closed successfully to know that all its data was received at the destination end. This is often been termed a graceful termination. The algorithms by which this is achieved are very similar to those described for the acknowledged FIN exchange. The receiver of a FIN does not acknow-

(1)

(2)

(3) (4)

A

E~t'.'..,tshed

('l,~s: rt[,

I

Not active

(Seq 29 )(Ack 78 )( data HIJKL )

< Seq 34 )( FIN )(Ack 78 )

(Seq 78 )(FIN)(Ack 35) ( Seq 35 )(Ack 79 )

B

Established , , . I .

I I

Closing

Not active

Fig. 10. Connection termination using acknowledged FIN exchange.

data in transit

A requests termination

B complies


(1) (2)

(2) (1)

A

Established, CLOSE

, .==o

Closing

I

Not active

(Seq 29 )(FIN )(Ack 78 ) (Seq 78 )( FIN )(Ack 29 )

( Seq 78 )(FIN )(Aek 29 ) (Seq 29 )( FIN )(Ack 78 ) ( Seq 30 )(Aek 79 )

( Seq 79 )(Ack 30)

B

Established, CLOSE * . .H

Closing I

CLOSE Closing

(2) - -> I

(3)

Not active (3)1 .....

I I

(Seq 29 )( Ack 78 )( data ABCDE)

(Seq 34 )( FIN )( Ack 78 )

(Seq 78 )(FIN)(Ack 35 ) (Seq 35 )(Ack 79)

(Seq 78)(FIN )( Ack 35 )

B

Established

I I

Closing

. . . , ,

I

Timeout Not active

l'ig. 12. Acknowledged FIN exchange terminated by fimeout.

466 CA. Sunshine, Y.K. Dalal / Connection management in transport protocols

(1) (2)

(3)

(1) 2

A

I crash OPEN Opening

rejected

I I

( Seq 6 )(SYN ) ( Seq 300 )( Ack 100)

( Seq 1 O0 )( RST )(Ack 300)

( Seq 6 )( SYN )

B

I NSN = 300, ESN = 100 Established I

I

believable Not active . . o .a

Fig. 13. |lalf-open connection discovery and reset.

general use of this command is now described. Assume that two TPs A and B are communicating

with one another when a crash occurs causing loss of memory to A. When A is up again it is likely to restart from the beginning, or conthiue from some recovery point. As a result the user process at A will probably try to open the connection again or try to send on the connection it believes is established. In the latter case it receives the error message connection not open from it~ TP. In an attempt to establish the connection, A will send a packet containing SYN. When the SYN arrives, B being in the Established state ignores the SYN; but responds with an acknowledgement indicating what sequence it next expects to hear. A sees that this packet does not acknowledge anything it sent and, being unsynchronized, sends a RST because it has detected a half open connection. B can believe the RST since it refers to its ESN and NSN and aborts the connection, notifying the user. This scenario is illustrated in Figure 13. If the acknowledgement packet was instead an old duplicate, the RST referring to it would not be acceptable at B. A will continue to retransmit its SYN and if the

user at B reopens the connection, it will eventually be established.

The case when A crashes and B tries to send data on what it thinks is an established connection is illustrated in Figure 14. The data arriving at A from B is unacceptable because no connection exists. So A sends back a RST. The RST is acceptable and B processes it and aborts the connection.

A variety of other cases are possible, all of which are accounted for, by the following rules for RST generation and processing:

1) If the connection is not yet Established (or does not exist), a RST should be formed and sent for any packet that does not acknowledge something the receiver sent earlier. The RST should take its sequence number field from the acknowledgement field of the offending packet (if it has one) and its acknowledgement field should acknowledge all data and control in the offending packet.

2) If the connection has been Established, any unacceptable packet should elicit only an empty acknowledgement packet containing the current NSN and an acknowledgement indicating the ESN.

(1)

(2)

A

I crash Not active

rejected

I Not active

(g~q 300)(Ack 100)(data AB)

(Seq 100)(RST)(Ack 302)

B

I NSN = 300, ESN -- 100 Established

I

believable, Not active

Fig. 14. Half-open connection discovery and reset.


3) All RST packets are validated by first checking their sequence field as for other packets, then if the RST acknowledges something the receiver sent (but has not yet received acknowledgement for), the RST must be valid. After validating the RST, the TCP changes state. If the connection was not Established it returns to the Opeing, of Listening state. If the connection was Established, it is aborted, placed in the NotAetive state, and the local user process is noti- fied.

6. Modeling and analysis techniques

Since overall protocol reliability depends heavily on the operation of connection management mechanisms, it is particulady important to verify that they function correctly. Most often this has been done by the sort of informal case studies performed in Section 3 to 5. In this form of analysis, the designer t~ies to identify all situations of interest, and to verify that the protocol "does the right thing" in each case. These narrative analyses are very valuable to provide motivation for and intuitive understanding of protocol mechanisms, and have successfully uncovered a number of design flaws. In the case of TCP, the problems associated with the credulous connection opening mechanism described in Section 2 were found this way, leading to the development of the 3 way handshake.

More rigorous analysis tecimiques must be used to verify the reliability of transport protocols with greater certainty. This requires a precise model of the protocol whose detailed operation can be analyzed. This model typically consists of a pair of protocol machines connected by a transmission medium. The machines receive commands from their respective users, and messages from each other via the transmission medium. The transmission medium is itself a simple kind of machine that may introduce errors, delays, and other perturbations between its input and its output.

This model may be approached from two view- points, local and global [14]. The local viewpoint focuses on an individual protocol machine and is most useful for specification and implementation purposes. The global viewpoint considers the entire system as a black box with inputs and outputs to each user. The global view is most useful for verification which must show that the outputs to each user are as desired in all cases.

Protocol verification techniques fall into two main classes, program proofs and state models [23]. In the former, each machine is specified algorithmicallv, and assertions which reflect the desired reliability goals must be formulated and proved. 1his approach has been effectively applied to verifying tile data transfer features of transport protocols wheL~e large or infinite numbers of interaction sequences are possible due to large sequence number spaces and retransmissions [2,21].

State models have been developed using such fomlalisms as Petri nets, state diagrams, state transition matrixes, flow charts, and programs to define the protocol machines [14,16,18,20]. Some form of reachability analysis is typically performed to generate all possible interactions of tile system, followed by a check for undesirable states. This approach has been successful in verifying data transfer features where simplifying assumptions are made to keep the number of states tractable, but is most applicable to connection management where the number of states is inherently small. Several mixed systems have also been developed in which state models are used for the basic staler of tile protocol machine, augmented by context variables which are not part of the basic state. [3,11].

A comprehensive treatment of these alternative techniques is beyond the scope of this paper. As an illustration, we present a brief description of the technique used to formaly verify portions of tile TCP connection management procedures in [22]. This technique falls into the state model class, and takes a global view.

Each protocol machine is modeled as a classic state machine with a', input set, an output set, a set of internal states, and ~tmctions giving the next state and output for each combination of input and current state. Inputs consist of user commands, messages from the other protocol machine (:~r tile network), and internally generated events such as timeouts. Tile two machines operate independently, with synchronization achieved by one machine waiting for a particular type of packet from the other machine.

We define the composite state of the system as tile state of the protocol machine on each side of the connection, plus any relevant packets in the transmission medium between them. Traasitions from one composite state to another are derived from the state trans~.tions of the individual protocol machines by including all possible transitions of either protocol machine, given the state of the transmission medium.

468 CA. Sunshine, Y.. Dalai / Connection management in transport protocols

The number of composite states is then the number of protocol machine states squared times the number of different states of the transmission medium. In a straightforward approach, every packet (including retransmissions) since the time the composite system was created would have to included in the state, making the state space unworkably large. This potential state explosion may be limited in two ways: by reducing the number of protocol machine states, and by treating as equivalent different sets of packets in transit as described below.

In order to verify connection management functions, the protocol machine is defined with basic states for connection establishment and termination functions only. These include a NotActive state, an Established state, and several intermediate states for going from one to the other as connections are opened and colsed. All data transfer takes place in the Established state, and is therefore not modeled by the basic protocol machine. Information about sequence numbers is part of the context information maintained outside the basic states of the protocol. This context information is used along with the basic state to determine the processing o f inputs.

To limit the number of transmission medium states, all packets in the transmission medium can be classified as either current or old by considering the

protocol's use of sequence numbers. Packets are current if either

l) The packet is pecding (waiting for ret~ansmis- sion) at the sender. Nc,nnally this condition holds until the sender receives some form of acknowledgement. For packets which are not retransmitted (ACK, RST) it does not hold.

2) The packet refers ~o a current packet traveling in the opposite direction (e.g. ACK), RST of another packet). When the opposite packet is no longer current, both packets are removed from the composite state.

Since we are primarily interested in worst case analysis, we assume 'that duplicate packets from previous connections may be held in the transmission medium and emerge during or after establishment of the current connection (subject to packet lifetime constraints). We group all old packets into a single class for each packet type (since their processing will be equivalent), and implicitly attach this set of old packet types 1o the transmission medium state. This means that we allow all old packet arrival events to occur in every state, which is a more rigorous but simpler to represent test for the protocol than any real tlansmission medium. Hence only current packets must be explicitly represented as part of the composite state which determines which packets can

-- " NotActive i

SYN arrives /

@, SYN- Received 1

I ACK arrives

1

OPEN or Retry

SYN arrives ~ I

set timeout ~, K-~ set timeout

~"~ SY N- Sent

! SYN- ACK arrives

send ACK

/I sOb'ished V

Fig. 15. Three way handshake protoco! machine.


arrive, since all old packets are implicitly part of the state.

ONce a packet is generated by a protocol transition, it remains in the composite state until it is no longer current, despite the assumption that the transmission medium can lose or damage packets. Since every current packet is either being retransmitted, or is a response to a packet being retransmitted, we can assume that current packets are always available to cause transitions. In reality, packets may temporarily disappear from the composite state if lost or damaged, .but will always reappear due to retransmissions.

By taking advantage of these reductions, a relatively compact model of cormection establishment in TCP may be developed. Figure 15 shows the basic protocol machine states and state transitions in opening a Connection. Each transition is marked by the input causing the transition, and the output (if any) accompanying it. Four states are needed to repre~nt a simple 3 way handshake protocol which retries on collision. The figure shows only normal events tor clarity. The set of operations {detect event, take appropriate action, move to new state} are assumed to be atomic or uninterruptable so that no confusion can result from near!v shmultaneous events.

For a complete model, the processing of all possible input events it, each state must be specified. As an example, the table for the SYN-Received state is shown in Figure 16. The set of input events consists of packets (SYN, SYN-ACK, Data, ACK, RST), user commands (OPEN), and internal timeouts (Retrans- mit, Quit, Retry-after-collision). Note that the same input event results in different outcomes depending on the context information. Fig. 17 shows the composite state diagrana resulting from the connection establishment protocol of Figure 15. Each composite state is represented by a pair of process states and a list of current pa.:kets. Some context is represented along wRh the basic state of each process. This consists of the sequence number for outgoing packets in the SYN-Sent, SYN.Received, and Established states, and also the sequ~:;nce number for incoming packets in the Establishet~ state. This allows us to determine ~hether the protocol has correctly irfitialized sequence numbers wlten the Esta- blished state is reached.

Current packets are represented by their packet types, wRh a subscript giving their own sequence number if relevant, followed by the sequ~mce number

Event Next State

SYN self

SYN self

SYN-ACK self

Data self

ACK ES

RST NA

ACK, RST self

OPEN self

Retrans self Quit self Retry self

Action and Context

Send ACK if SYN is duplicate for current connection. Send RST if SYN is not from current connec. tion. Send RST referencing SYN-ACK. Discard as out of order (or hold). If ACK refers to transmitted SYN-ACK. (Third part of 3 way handshake). If RST refers to transmitted SYN-ACK. (SYN previously received was an old duplicate.) Ignore if does not refer to transmitted SYN- ACK. Ignore since already in progress. Retransmit SYN-ACK. Notify local process. Ignore since other side has already started.

ES = Established NA = Not Active

Fig. 16. State transition table for SYN-rcceived state.

of another packet they may reter to (in parentheses). An arrow above the packet shows its di, ion of travel. Thus SY~-ACKy(x) represents a -ACK packet with sequence number y, referring othzr packet with sequence number x, and travc flom left to right.

Symmetric states (identical except for switching process identities and packet directions)have been eliminated to simplify the figure. Transitions to the same state such as retransmissions are not shown. Composite transitions resulting from simultaneous transitions of both protocol machines are perfectly legal, but are shown as sequential individial transitions to reduce the number of arrows.


111~ I

~ SSx)(NA)(SYNx,R~(x)} . . . . . . . . . . .

"# T #- ~ -i~ (SSx.~(SRy)(SYNx,SYN. ACKy(Z),RST(x),RST(v))

-II, T 1,,,. ,,,,I, (SSx)(SRy)(SYNx,SYN. ACKy(z ),RST(Y))

,.1' (NA)(~Ry)(SYN- ACK, (z),RST(v))

(,- ~ (,- (SRx)(SR,)(SYN- ACKy(z),SYN- ACKx(W),RST(w))

.,) (SRx)(SRv)(SYN- ACKy(Z),SYN.ACKx(W)) /

T (NA)(SRy)(SCY~ACK,

] . . . . . . . . ) (NA)(NA)0

. . (SSx)(SSy}(SYNx,SYNy) (""-'--" = ~ (S':x)(NA)(SYNx) (

(SSx)~SR, , ) (SYNx,SYN- ACKy(X))

(ES)(~;Ry)(SYN. ACKy(x),ACK(Y))

x-'>

(ES)(ES)O y(,- (,-.y x -,~ -,->x

..D 4... ,I... (SSx~(SRy)(SYNx,SYN- ACKy(z),RST(x))

-,T,- ,(Z)) .... ) (SSx)(SRy)(SYNx,SYN- AGKy(z))

Process~IProcess~/Current \

(NA) NOT ACTIVE (SS x) SYN SENT WITH SEQ #x .($Ry) SYN RECEIVED AND A

SYN SENT WITH SEQ # Y (ES) ESTABLISHED X ~ incoming (expected) seq. # Y ~ outgoing seq. #

Fig. 17. Composi le state diagram for "three way handshake".

This composite state model demonstrates several aspects of protocol correctness for the normal case where both protocol machines start in the NotActive state and function according to their detinition (no failures). This includes safety considerations (absence of deadlocks, correctness of outcome), and liveness considerations (progress and eventual termination)as follows.

There are no terminal states with one machine Established and the other not Established. The only terminal states have both processes NotActive (if a connection was rejected) or both processes Esta. blished. Furthermo~.e, when both processes are Esta. blished, sequence numbers for both directions are properly initialized. Hence there is no deadlock in the

procedure for connection establishment. All paths leading back to the NotActive state for

either process are caused by collisions (simultaneous open requests) which will cause a later retry to establish the connection. Assuming that perpetual colli. sions are avoided by the random retry interval, and that the transmission medium provides a nonzero probability of delivei'ing any packet, the protocol will eventually succeed in establishing a connection (unless the attempt is rejected).

These results show the sufficiency of the connection establishment mechanisms embodied in the 3 way handshake. The inadequacy of simpler gechnique (given worst ease transmission medium behavior)was demonstrated informally in the discussion accom.


partying Figure 2, and is shown formally using the above techniques in [22]. The mechanisms for recovering from protocol failures described in Section 5 can also be incorporated into the model by adding corresponding transitions to the protocol machine. These introduce transitions out of the half closed composite state which lead to reestablishing the connection [22].

7. Conclusions

Connection management functions are intimately involved in transport protocol reliability and require careful design if errors are to be avoided under unusual but possible circumstances. Special care must be taken to avoid the possibility of confusing packets from different connections between the same pair of processes since old packets may persist in the network and be delivered during later conne,~tlons. This requires a unique identitier for each new connection, or careful selection of the initial sequence number (ISN) to be used. A clock may be used to reliably determine the ISN even after protocol failures, but clock-based schemes require resynchronization of the connection at certain points. The 3 way handshake provides a means for reliably synchronizing the two sides of the protocol to establish a connection while minimizing the amount of state information that must be maintained for inactive connections. This involves the exchange of special synchronization control packets (SYN) at the start of a connection.

Control packets (FIN) may also be used to terminate a connection. In a graceful closing, the protocol guarantees that all data sent before the close command is issued will be delivered. In an immediate closing, data in transit may or may not be delivered, but both sides know that the connection has terminated. A unilateral closing or abort does not guarantee that resources on the other side of the connection have been released. Halt'open connections resulting from failure of on side of the protocol can also be terminated by an exchange of control packets.

While infonnal analysis of scenarios is a very useful tool for designing connection management mechanisms, it is not adequate for a fully reliable analysis. More precise models defining the detailed operation of the protocol machines on each side of the connection must be developed for this purpose. One such model specifying the machines as f'mite automata augmented by context information is presented and

was successfully used to verify connection establishment in TCP.

Acknowledgements

The development of the Transmission Control Protocol which provided the basis for this work was carried out while the authors were at the Stanford University Digital Systems Laboratory, supported in part by the Defense Advanced Research Projects Agency (under contract number MDA903- 76C-0093) and by the National Science Foundation Graduate Fellowship Program. Vint Cerf and Robert Kahn provided many of the original ideas behind TCP and Vint Cerf has been a constant participant in discussions on improvements. The Cyclades research group under Louis Pouzin also contributed many early ideas through discussions i~. IFIP TC 6.1. Ray Tomlinson suggested several important modifications including the 3 way handshake. Bili Plummer, Dick Karp, Jim Mathis, and Bob Metcalfe also contributed significantly to the work. The continuing support of TCP development by ARPA has made this wr.,rk l~ossible.

Appendix

The following relationships between the clock parameters presented in section 5 are useful in analyzing resynchronization.

S = 32 bits (1)

D - This is a design parameter which is chosen primarily on the basis of the time the TCP is willing to wait before the same processes c,'m communicate again. Since it affects some of the other parameters too, it should not be chosen com- pletely arbitrarily.

B = (2q)/D octets/sec (2)

C = 2PD sec (3)

2 S > 2BL -The maximum rate at which sequence numbers are used is related to L and S [4 I. This prevents packet sequence numbers from cycling and hence being reused while old packets with identical sequence numbers may still be in the network.

Time until clock resynchronization The connection must be resynchronized when packet sequence numbers about to be assigned lie in the forbidden zone to the right. We now show how the resynchronization time R, since the connection was synchronized or last resynchronized, depends on the actual average bandwidth b being

achieved. Assume that the step curves in Figure 7 have been liaear-

izeu, and have the same average slope, s is the sequence number at any time t. The equation of the line giving the (linearized) forbidden zone boundary is

s = B(t - (C - L)),

because the line is displaced in time from the origin by (C - L), a~d has a slope B.

The equation of tile line giving the actual use of sequence

472 C.A. Sunshine, Y.K. Dalai/Connection management in transport protocols

numbers is

s=bt.

Hence the point of intersection gives t = R.

bR = B(R - C + L)

R =(C-L ) I (1 -b /B)

= 00 when b - B

= C - L when b = 0 (4)

'l'he end conditions are intuitively correct, becaum if b = B then the clock never catches up with the actual rate of transmission and therefore resynehronizafion never has to be performed, or if b = 0 then resynckronization has to be performed at a time corresponding to the potential assign- ment of a sequence number that lies within the forbidden zone .

Forbidden zone tests We now describe the two tests for determining whether sequence numbers assigned to data will lie in the forbidden zone or not, in terms of the clock parameters. Let x be this current sequence number and n be the current value of the clock. In order to determine whether x is in the forbidden zone to the right, the TCP tests if x lies ~ithin the range [ forbidden zone boundary, ISN + 1 curve] i.e. [(2q(n + 1 + F'L] D "3 - 1), 2q(n + 1)] I91.

Therefore, if

({2q(n + 1 +FL[D-]) - 1 - x} mod 2 S) < 2qFL[D'q, (5)

then the connection must be resynchronized. In order to determine whether sequence number assign-

ment will enter the forbidden zone from the bottom, the TCP tests whether any of the sequence numbers assigned to dat,~ in the packet lie in the range [ISN + 1 curve, forbidden zone boundary ].

Let d be the length of the data in the packet in octets and define

y = (x + d - 1) m,:d 26'.

The test can be refornmlated as whether the ISN + 1 curve lies within the range [x,y].

Therefore, if

({)'- 2q(n + 1)} rood 2 S)


1141

[ISl

[16i

[17l

1181

[191

[201

M.G. Gouda and E.G. Manning, On the Modelling, Analysis, and Design of Protocols - A Special Class of Software Structures, Proc. Int. Conf. on Software Engi- [21] neering, October 1976, pp. 256-262. S.T. Kent, Encryption Based Protection for Interactive [22] User/ComPuter Communication, Proc. Fifth Data Com- munications Syrup., Snowbird, Utah, September 1977, pp. 5.7 -5.13. P.M. Merlin and D.J. 'Farber, Recoverability of Corn- [23] munication Protocols- Implications of a Theoretical Study, IEEE Trans. Communications, September 1976, .. pp. 1036-1043. R.M. Metcalfe, Packet Communication, MIT Project MAC Report TR-II4, December 1973. (PhD Thesis, [24] Harvard Univ.) J.B. Postel and D J. Farber, Graph Modeling of Com- puter Communications Protocols, Proc. Fifth Texas Conf. on Computing Systems, Austin, Texas, October 1976, pp. 66-77. D.P. Reed, The Initial Connection Mechanism in DSP, [25] MIT Lab for Computer Science LNN i0, Aug,st 1977. H. Rudin, C.H. West, and P. Zafiropulo, Automated Protocol Validation: One Chain of Development, Proc.

Symp. on Computer Network Protocols, Liege, Bel- gium, February 1978. N.V. Stenning, A Data Transfer Protocol, Computer Networks, 1, 2, September 1976, pp. 99-110. C.A. Sunshine, lnterprocess Communication Protocols for Computer Networks, Digital Systems Lab Technical Report No. 105, Stanford Univ., December 1975. (PhD Thesis). C.A. Sunshine, Survey of Communication Protocul Verification Techniques, Proc. Symp. on Computer Net. works: Trends and Applications, Gaithcrsburg, Mary- land, November 1976, pp. 24-26. (IEEE 76( H 1143-- 7O. R.S. Tomlinson, Selecting Sequence Numbers, Proc.. ACM SIGCOM/SIGOPS lnterprocess Communication~ Workshop, Santa Monica, Calitornia, March 1975, pp. 11-23. (A CM Operating Systems Review, Vol. 9, No. 3, July 1975) Also INWG Protocol Note No. 2, August 1974. D.C. Walden, A System for Interprocess Communica- tion in a Resource Sharing Computer Network, Comm. ACM 15, 4, April 1972, pp. 221-230.

ppppp

Documents

reliable transport protocol

host protocol

protocol reliability

transport protocol programs

protocol layer

lower level protocol

usa transport protocols

transport protocols