parallel tcp sockets: simple model, throughput and validation...parallel tcp sockets: simple model,...

18
Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin (IRISA/INRIA), Milan Vojnovi´ c (MSRC) October 2005 Technical Report MSR-TR-2005-130 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 http://www.research.microsoft.com

Upload: others

Post on 11-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

Parallel TCP Sockets:Simple Model, Throughput and Validation

Eitan Altman (INRIA), Dhiman Barman (INRIA),Bruno Tuffin (IRISA/INRIA), Milan Vojnovic (MSRC)

October 2005

Technical ReportMSR-TR-2005-130

Microsoft ResearchMicrosoft Corporation

One Microsoft WayRedmond, WA 98052

http://www.research.microsoft.com

Page 2: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

Abstract— We consider a simple model of parallel TCPconnections defined as follows. There areN connectionscompeting for a bottleneck of fixed capacity. Each con-nection is assumed to increase its send rate linearly intime in absence of congestion indication and otherwisedecreases its rate to a fractionβ of the current sendrate. Whenever aggregate send rate of the connectionshits the link capacity, a single connection is signalled acongestion indication. Under the prevailing assumptions,and assuming only in addition a mild stability condition,we obtain that the throughput is the factor of the linkcapacity, 1−1/(1+const N), with const= (1+β)/(1−β).This result appears to be previously unknown; despitesimplicity of its final form, it is not immediate. The resultis of practical importance as it elucidates the throughputof parallel TCP sockets, an approach used widely toimprove throughput performance of bulk data transfers(e.g. GridFTP), in regimes when individual connectionsare none or weakly synchronized. We argue that it isimportant to distinguish two factors that contribute to TCPthroughput deficiency (F1) TCP window synchronizationand (F2) TCP window adaptation in congestion avoidance.Our result is a good news as it suggests that in regimeswhen (F1) does not hold, already a few sockets areenough to almost entirely eliminate the deficiency due to(F2). Specifically, the result suggests that already3 TCPconnections yield90% link utilization and 95% is almostachieved by6 connections. This analytically proven resultshould provide incentive to throughput-greedy users tolimit the number of their parallel TCP sockets as a few con-nections already ensure effectively100% utilization, andany additional connection would provide only a marginalthroughput gain. Opening too many sockets is not desirableas such transfers may beat down other connections sharinga link on the path of this transfer. However, there stillremains a throughput deficiency due to (F1), which mayprovide incentive to users to open more sockets. The resultfound in this paper suggests that throughput-deficiency ofparallel TCP sockets would be largely attributed to thesynchronization factor (F1) and not to window control(F2). This motivates intelligent queueing disciplines thathelp mitigating the synchronization. As a by-product, theresult shows that emulation of parallel TCP connectionsby MultTCP protocol is a good approximation. Theimplication of the result is that aggregate throughputachieved by connections is insensitive to a choice of losspolicy which connection is signalled a congestion indicationat congestion events. This perhaps somewhat counter-intuitive throughput insensitivity property is showed notto hold for steady-state distribution of the aggregate sendrate. We provide validation of our results by simulationsand internet measurements.

I. INTRODUCTION

Parallel TCP sockets is a generic “hack” to improvethroughput attained by TCP for bulk data transfers by opening

several TCP connections and striping the data file over them.There are two factors that cause TCP throughput deficiency(F1) TCP window synchronization and (F2) window adapta-tion in TCP congestion avoidance. The former is well knownand fixes have been proposed a while ago (e.g. RED [12]). Thethroughput deficiency due to (F2) is intrinsic to congestionwindow control in TCP congestion avoidance and is also wellknown. It is perhaps best illustrated by admitting a simplemodel of a single TCP connection over a bottleneck whosesteady-state window dynamics is deterministic and periodicdescribed as follows. In a period, the window W increaseslinearly in time with rate 1 packet per round-trip time andis reduced to one half of its current level when it encounterscongestion. It is readily computed that such idealized TCPconnection attains 75% link utilization. The simple model isthe underlying model of a particular TCP square-root loss-throughput formula, as found in [11], [18]. The concept ofparallel TCP sockets is widely used by applications suchas for example GridFTP∗. In practice, it is often unclearhow many sockets one needs to open in order to achievesatisfactory throughput. Opening too many connections maybe undesirable for various reasons and it is thus of importanceto understand opening how many connections is sufficient toyield good throughput performance. Relation of TCP through-put of a single connection to loss event rate and mean round-trip time is to date fairly understood (e.g. [4], [18], [19]), butthe results do not readily extend to parallel TCP connections.These formulae only leave us with two other unknowns. Theproblem is that it is not well understood how these two un-known parameters depend on the number of TCP connectionscompeting in a bottleneck. The approach leveraging on knownTCP loss-throughput formulae was pursued by Hacker et al[14], which provides very informative measurement results,but no throughput formula for parallel TCP connections.

It is important to distinguish the factors (F1) and (F2) asthey are different throughput-deficiency causes. Our analysisis concerned with factor (F2). It is a good news as it suggeststhat already a few connections fixes the problem, in operationalregimes when there is no synchronization (i.e (F1) not true)or there is some but is weak.

In this paper, the send rate of each connection is assumed toincrease linearly in time in absence of congestion indicationto this connection, and otherwise decrease to a fraction βof the send rate just before a congestion indication (multi-plicative decrease). We adopt a model of bottleneck originallyintroduced by Baccelli and Hong [9] where link congestionevents occur whenever the aggregate rate of connections hitsthe link capacity. Strictly speaking, the model assumes zero-buffer; see [9] for discussion how this can be relaxed toaccommodate a finite buffer. The zero-buffer assumption ismade for tractability and would be justified in scenarios whenpropagation delay dominates the queueing delay. We providevalidation of this in Section V. We assume

(ONE)At each congestion event, a single connection is

∗www.globus.org/datagrid/gridftp.html

Page 3: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

0.5 1 1.5 2 2.50

0.2

0.4

0.6

Fig. 1. Sample paths of 3 AIMD connections competing for a link.

signalled to reduce its send rate by one multiplicativedecrease.

Our main result is a formula for aggregate throughput x(N) ofN connections over a link of fixed capacity c,

x(N) = c

1− 1

1+ 1+β1−βN

(1)

where 0 < β < 1; for TCP connections β = 1/2. The resultis exact under a mild “stability” condition (see Section II,Theorem 1). Note that for the special case of one connec-tion, N = 1, the result reads x(N) = c(1 + β)/2. Hence, forTCP-like setting x(N) = (3/4)c, which recovers the 75%utilization pointed out earlier. Note that nothing is said onwhich connection is chosen at link congestion events. Figure 1illustrates several system assumptions that we made so far:linear increase and multiplicative decrease, link congestiontimes, and assumption (ONE). The assumption (ONE) wasmade in some other works, e.g. [10]. It would be more generalto assume that at each link congestion event a subset ofconnection is chosen to undergo a multiplicative decrease. Onthe other extreme, we may assume that at each link congestionevent, ALL connections undergo multiplicative decrease. Thiscorresponds to TCP window synchronization. Baccelli andHong [9], in turn, assume each connection is chosen with somefixed probability, so that at each congestion event a fractionof connections is signalled a congestion indication. The readershould bear in mind that (ONE) and similar assumptionsare mathematical abstractions. In practice, though the systemdynamics is blurred by various factors such as packetizationand feedback delays and there will always be some positivetime between any two connections undergoing multiplicativedecrease, so it may not be justified to aim at validating(ONE) in some strict sense. The definition of link congestiontimes would in practice accommodate TCP connections overa network interface with finite buffer and FIFO queueingdiscipline (DropTail), with buffer length sufficiently smallerthan bandwidth-delay product. In practice, most network in-terfaces are DropTail. The linear increase of send rate wouldapproximate well TCP [15] in regime where round-trip timeis dominated by propagation delay and queueing delay can beneglected [9]. It is well known that with substantial queueing

delay, TCP window growth over time is sub-linear for largewindows [6]. The assumption that send rate increase is linearover time, may be satisfied by transport control protocols otherthan standard TCP, e.g. [17] shows this to be true for ScalableTCP [16].

In our analysis we also assume:

(R) TCP connections are not constrained by their receiverwindows.

With TCP receiver window in effect, the throughput wouldincrease roughly linearly with the number of parallel TCPsockets for a range from 1 to some number of TCP connec-tions. TCP receiver window constraint is not further consideredin this paper. In analysis we assume (R) holds and in exper-imental validations we set TCP receiver window larger thanthe bandwidth-delay-product whenever possible.

We argue that it is of interest to understand throughputefficiency of parallel TCP sockets for a small number ofparallel sockets. Opening too many sockets may not be ofpractical interest as this imposes processing overhead on end-host operating systems and also leaves smaller bandwidth toother flows that may sporadically share the bottleneck duringa parallel-socket data transfer. The result (1) suggests that thethroughput gain of N parallel TCP sockets monotonically in-creases with N and this increase is such that already 3 socketsis sufficient to attain 90% link utilization. Setting N = 6 wouldyield almost 95%. This suggests that a few connections isalready enough to compensate for the throughput deficiencyof TCP. This is a good news as it would provide incentiveto rational throughput-greedy users not to use an overly largenumber of parallel sockets per transfer, as this would yieldonly a marginal throughput gain and as argued earlier openingtoo many sockets may not be desirable.

There has been a recent surge of interest on TCP and smallbuffer sizes [8]. While results in this paper do have someconnection to this line of the work, as our analysis wouldbe accurate for appropriately chosen small buffer size, thereare notable differences. The results in [8] suggest that buffersin routers can be set to smaller values than bandwidth-delayproduct (BDP) for a large number of TCP connections and stillhave full link utilization. More specifically, it is suggested thatfor a link shared by N connections and link capacity scaledlinearly with N, it is sufficient to have buffer length of the

Page 4: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

order BDP/√

N, asymptotically for large N. The throughputformula (1) is for any finite N and as argued earlier it isour particular interest to address the case when the numberof connections N is not too large. In our context, it maywell happen that bottleneck link is in access network, not abackbone router, and it may simply turn out to be small relativeto propagation round-trip time of a network path between end-points of a parallel transfer. This would hold in practice formany end-to-end paths. Consider a bottleneck link with buffersize L packets, packets of length MTU B, and link capacity cMB/s. Then, we may regard the buffer to be small for networkpaths of propagation round-trip time RTT such that RTT �L MTU/c. For example, consider in practice typical valuesc = 100 Mb/s, MTU=1500 B, and L=100 packets. Then, thecondition is RTT � 12 ms, which would be true for manywide-area data transfers.

The result (1) tells us more. It reveals that throughputis insensitive to a choice of loss policy that decides whichconnection is signalled per congestion event as long as the losspolicy verifies (ONE) and stability conditions in Theorem 1.Note that the set of loss policies that verify (ONE) is largeand includes many natural candidates for loss policies. Thisinsensitivity result may appear counter-intuitive to some asone’s intuition may be that loss policies that favour pickingconnections with larger rate at congestion times would yieldsmaller aggregate throughput. The result says that this is nottrue. The throughput insensitivity in the present form wassuggested by a result in [5], which is under more restrictiveassumptions of only 2 connections and special loss policiesthat render send rates a Markov process. One may wonderwhether steady-state probability distribution, and in particular,higher-order moments, of aggregate send rate are insensitiveto a choice of loss policy. We show that this is not true in Sec-tion III. The result in Section III is not new as it was alreadyfound in [5], however, Section III provides corrected versionswith complete proofs. The results in Section III and Section IVare particular choices of loss policy: (i) rate-proportional (= aconnection is chosen with probability proportional to its sendrate just before the congestion event), (ii) largest-rate (= aconnection with largest rate just before the congestion eventis chosen), (iii) rate-independent (= a connection is chosenwith fixed probability), and (iv) beatdown (= a connectionis chosen as long as permitted by link capacity). The rate-proportional loss policy may be deemed a natural candidateand may be seen as randomized version of the largest-ratepolicy. We consider the rate-independent policy in view ofpast work [9]. The fancy beatdown loss policy is consideredonly to demonstrate that throughput insensitivity found in thispaper holds for a versatile set of loss policies. Note that inthe Internet either of these loss policies or some other policiesmay emerge as a result of the dynamics of TCP congestionwindow control and particular queue discipline such as forexample DropTail. More radical approach would be to enforceassumption (ONE) and particular loss policy by an intelligentqueue discipline. Analysis of benefits and performance of suchintelligent queue disciplines is beyond the scope of this paper.

A. Organization of the paper

Section II contains our main result. In that section, weprovide a sufficient set of assumptions and notations and thenstate the main result, along with discussion of its implications.Proofs are deferred to Appendix. Section III shows that steady-state distribution of the aggregate send rate of connections isnot insensitive to loss policy of the bottleneck link. Severalauxiliary results are provided in Section IV. Section V con-tains experimental validation, both simulations and internetmeasurements. The last section concludes the paper.

II. MAIN RESULT

A. Model and result

We consider N connections that compete for a link ofcapacity c by adapting their send rates as follows. Let Xi(t)be the send rate of a connection i at time t. A time t ′ is saidto be a link congestion time whenever it holds ∑N

i=1 Xi(t ′) = c,i.e. the aggregate send rate hits the link capacity. Let Tn be then-th link congestion time enumerated such that T0 ≤ 0 < T1,for some time instant labelled with 0. A connection i increasesits send rate linearly in time with rate ηi > 0 and it is reducedat some link congestion times to a fraction 0 < βi < 1 ofthe current send rate. We call such an adaptive connection,additive-increase and multiplicative-decrease (AIMD) connec-tion with additive-increase parameter ηi and multiplicative-decrease parameter βi.

The foregoing verbal definitions can be formally rephrasedas, for i = 1,2, . . . ,N,

Xi(t) = Xi(Tn)+ηi(t −Tn), Tn ≤ t < Tn+1 (2)

and

Xi(Tn+1) = (1− (1−βi)zi(n))Xi(Tn)+ηiSn (3)

with

Sn =1η

(c−

N

∑j=1

(1−β j)z j(n))Xj(Tn)

), (4)

where Sn := Tn+1 − Tn is obtained from ∑Ni=1 Xi(Tn+1) = c,

η := ∑Nj=1 η j, and z j(n) = 1 if the connection j is signalled a

congestion indication at the n-th link congestion event, else,z j(n) = 0. Under (ONE) ∑N

j=1 z j(n) = 1. We are now ready to

Page 5: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

0 1 2 4 6 8 10 12 14 16 18 200.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Number of connections N

Thr

ough

put /

link

cap

acity

Fig. 2. Throughput of N identical AIMD connections competing for a linkof unit capacity with TCP-like setting (β1 = 1/2).

state our main result.

Theorem 1: Consider the prevailing system of N AIMDconnections with identical AIMD parameters, competing fora link of capacity c. Assume that (ONE) at each link conges-tion event exactly one connection is signalled a congestionindication. Assume in addition that the following stabilityconditions hold: the system has a unique time-stationarydistribution and times between successive link congestionevents, sampled at congestion events, have a strictly positivemean. Then, the throughput XONE has the time-stationaryexpected value

IE(XONE(0))c

= 1− 1

1+ 1+β11−β1

N. (5)

The proof is given in Appendix A. We deliberately deferredthe proof to the appendix, so as to discuss the implicationsof the result first and leave the proof isolated only to theinterested reader.

B. Discussion of the result and its implications

We discuss several implications of Theorem 1.1) Throughput of parallel TCP sockets: The throughput

formula (5) suggests explicit dependence of the throughput onthe number of parallel TCP sockets N. Indeed, the throughputmonotonically increases from (1 + β1)/2c for N = 1 to 1as N tends to infinity. The former case of one connectionis precisely the known result (3/4)c thats hold for TCP-like setting β1 = 1/2. The dependence on the number ofsockets given by (5) is showed in Figure 2 for β1 = 1/2.The result suggests that opening as few as 3 connectionswould already yield 90% link utilization. Further, opening 6sockets yields almost 95% link utilization. Hence, openinga number of sockets beyond some relatively small value

would yield only a marginal throughput gain. Equation (5)gives explicit dependence of the throughput on the number ofsockets N, and thus provides a guideline to choose the magicconfiguration parameter of the number of sockets for a datatransfer. Choosing this configuration parameter in practice wasa puzzle as throughput dependence on the number of socketswas largely not understood. The result also suggests incentiveto throughput-greedy users not to open too many sockets, asopening a few is enough.

2) Throughput is insensitive: No matter which connectionis signalled a congestion indication per link congestion event,the mean throughput remains the same, as long as the mildstability conditions of the theorem hold. This throughputinsensitivity finding may appear counter-intuitive as one mayexpect that loss policies that favour signalling congestionindication to large-rate connections would have detrimentaleffect on the mean aggregate throughput. The result revealsthis is not true. We show in Section III that the aggregatesend rate higher-order statistics is not insensitive on the losspolicy.

The throughput insensitivity in the general form found inthis paper was motivated by original finding in [5]. Note thatwe generalize the finding of [5] in the following directions:(i) to any finite number of connections N ( [5] is for N = 2)and (ii) a larger set of loss policies. We discuss (ii) in somemore detail now. The loss policies in [5] are restricted to thosethat base decision which connection to signal a congestionindication only on the send rates just before a link congestionevent. As an aside, note that this renders the send rates aMarkov process. The result of Theorem 1 tells us that thethroughput insensitivity holds for a much broader set of losspolicies that can depend on any past system state as long asthe mild stability conditions of the theorem hold.

Note that the throughput insensitivity allows to bias sig-nalling congestion indication to some connections at linkcongestion times. The throughputs achieved by these specialconnections would indeed deteriorate by this bias, however, theaggregate throughput over all connections remains the same.This is illustrated by the following example.

Example 1: Consider 4 AIMD connections competing fora link of unit capacity. The connections have identical AIMDparameters, set as ηi = 1 and βi = 1/2. The loss policy is rate-independent with the probability that a connection i is selectedequal to σ/2, for i = 1,2, and (1−σ)/2, for i = 3,4, for some0 < σ < 1. The parameter σ is the bias parameter that for σ <1/2 favours picking either connection 1 or 2, σ = 1/2 favoursno connection, and σ > 1/2 favours picking connections 3or 4. Hence, as σ ranges from 0 to 1, the connections 1and 2 (resp. 3 and 4) will receive equal throughput that willdecrease (resp. increase) with σ. In the prevailing example,the system does have a unique time-stationary distribution,hence the hypotheses of Theorem 1 are satisfied. Theorem 1implies that the mean aggregate throughput is insensitive to thepreferential treatment of connections. The simulation results inFigures 3 and 4 indeed validate the claim.

Page 6: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

σ

Thr

ough

put

Fig. 3. Throughputs of 4 AIMD connections in Example 1. There are twoclasses of connections, each class containing 2 connections. A connectionselected at a link congestion time is from the first class with probability σ.The flows of the two classes attain throughputs that depend on the parameterσ. However, the aggregate throughput is insensitive (dots aligned with theanalytical prediction shown by the solid line). This validates Theorem 1.

3) Comparison with the synchronized case: Consider nowthe loss policy we call ALL, which signals all connectionsa congestion indication at each link congestion event. Notethat this loss policy satisfies (ONE) only if N = 1. The losspolicy ALL models the special regime of synchronized TCPconnections. A natural question is to compare the throughputsunder loss policy ALL and loss policies that verify (ONE).The throughput under loss policy ALL is readily obtained tobe†

IE(XALL(0))c

=1+β1

2. (6)

It is easy to check that

IE(XONE(0)) ≥ IE(XALL(0))

with equality only if N = 1. Note that, indeed, the throughputachieved under the loss policy ALL f orN > 1 is exactlythat achieved by a single connection under either ALL orONE. Thus, the ratio of the throughputs achieved under a losspolicy that satisfies (ONE) and under the loss policy ALLcorresponds to the throughput gain achieved under the losspolicy that satisfies (ONE) for N connections with respectto the throughput for N = 1. The loss policies under (ONE)provide no gain with respect to the loss policy ALL forN = 1, but they provide larger throughput for a factor whichis exactly that given by (5) divided by (1 + β1)/2. The lastmeans that loss policies under (ONE) provide throughput gainwith respect the loss policy ALL equal to 2/(1 + β1), asthe number of connections N tends to infinity. For TCP-like

†This follows by the fact that under ALL there is a unique limit pointx∗ of the send rates just before a link congestion event such that x∗i = c/N,i = 1,2, . . . ,N.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

t

Xi(t

)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

t

Xi(t

)

Fig. 4. Sample-paths of the rates of the 4 AIMD connections ”orchestrated”by the rate-independent policy (Example 1). A connection is chosen fromthe probability distribution [σ,σ,1−σ,1−σ]/2. (Left) σ = 1/2 so that eachconnection has equal chance to be selected. (Right) σ = 1/8 so that twoconnections are selected more often than the other two connections.

setting, this amounts to 4/3, which is precisely compensatingfor the throughput deficiency of a single TCP connection.

4) What this tells us about MultTCP?: MultTCP, proposedby Oechlin and Crowcroft [10], is a window control protocolthat aims at emulating a given number of N TCP connections.The design rationale is to scale the linear increase of thecontrol with N as Nη1 and set the multiplicative decreaseparameter to 1− (1− β1)/N. MultTCP can thus be seen asa single AIMD connection with appropriately set additive-increase and multiplicative-decrease parameters as a functionof N. The choice of the multiplicative decrease parameter isbased on an approximation by considering a virtual set ofN parallel connections. It is assumed that at each congestionindication to these connections, only one connection under-goes a multiplicative decrease. Thus, it admits loss policiesverifying (ONE). The send rate of this selected connection ata time just before a congestion indication is x/N, where x is theaggregate rate over all connections just before the congestionindication. The last is an approximation as, indeed, the selectedconnection may have the send rate other than the even sharex/N. In the sequel, we consider MultTCP for TCP-like settingβ1 = 1/2. We already observed that MultTCP can be treatedas a single AIMD connection with additive-increase parameterNη1 and multiplicative-decrease parameter 1−N/2, hence, thethroughput follows immediately from (6)

IE(XMultTCP(0))c

= 1− 14N

.

In turn, from (5), the throughput of N TCP-like connectionsis

IE(XONE(0))c

= 1− 11+3N

.

We note that IE(XMultTCP(0)) ≥ IE(XONE(0)), for all N ≥ 1,with equality only if N = 1. It is readily computed that themean throughput of a MultTCP connection is less than 1.025

Page 7: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

of the mean aggregate throughput of N TCP-like connectionsunder a loss policy in ONE. It is hence, for any practicalpurpose, a good approximation.

The last observation does not provide much motivation toobtain a correction for MultTCP so that its throughput isexactly that of N TCP-like connections under a loss policythat verifies (ONE). Note, however, redefining only β1 to afunction of N provides no solution, except for the trivial caseN = 1. Indeed, by equating (5) and (6) and a simple calculationone obtains β1(N−1) = N−1, hence, a 0 < β1 < 1 exists onlyif N = 1.

5) Round-trip time insensitivity: The result (1) suggestsinsensitivity of the mean throughput to the round-trip time ofthe connections as (5) does not depend on η1. With TCP-likeconnections, the parameter η1 will be defined as reciprocal ofa fixed round-trip time to the power of 2.

III. STEADY-STATE DISTRIBUTION OF THE AGGREGATE

SEND RATE IS NOT INSENSITIVE

The insensitivity property found in the earlier section is forthe first-order moment of the time-stationary aggregate sendrate. The time-stationary distribution is not insensitive to theloss policy. This was already showed in [5] by computationof the time-stationary second moments for particular policies.We provide a complete proof for the second moments, whichyields corrected versions of the corresponding expressions in[5]. A new information is for the beatdown policy.

Theorem 2: Consider two identical AIMD connections withβ1 = 1/2. The time-stationary second moments for eitherflow are given by

1) Beatdown: IE(X(1)(0)2)c2 = 1

3 ≈ 0.33333

2) Rate-independent: IE(X(1)(0)2)c2 = 5

24 ≈ 0.20833

3) Rate-proportional: IE(X(1)(0)2)c2 = 679

3396 ≈ 0.19994

4) Largest-rate: IE(X(1)(0)2)c2 = 4

21 ≈ 0.1905.

The proof is a tedious exercise of elementary and Palmcalculus and is deferred to Appendix B.

Note that the order of the loss policies with respect tothe time-stationary second-order moment of the throughput israther natural. Intuitively, one would expect that the largest-rate policy attains smallest second-order moment as it balancesthe rates at congestion times by picking a connection withlargest rate. One also would expect that the rate-proportionalpolicy has a larger second-order moment of the throughput,but not too much as it also favours selecting connections withlarger rates at link congestion times. The rate-independentpolicy, with each connection selected equally likely, has alarger second-order moment of the throughput than both thelargest-rate and rate-proportional. The beatdown policy attainsthe largest second-order moment of the throughput within thegiven set of policies.

IV. ADDITIONAL ANALYSIS FOR LARGEST-RATE AND

BEATDOWN POLICIES

This section shows some auxiliary results on the dynamicsfor particular loss policies, the largest-rate and beatdown. Theanalysis provides some insight on the dynamics of the sendrates and serve in the proof of the second-order momentresults in the earlier section. It may be however skipped by areader non interested in technical details, with no sacrifice ofcontinuity.

A. Largest-rate policy

From now, we use the short-cut notation xi(n) := Xi(Tn).We show that for a homogeneous population of N AIMDconnections, there exists a unique stationary point for the ratevector x(n) = (x1(n), · · · ,xN(n)), under the largest-rate policyand identify this stationary point.

In view of (3) and (4), dynamics of the rates under thelargest-rate policy is fully described by the following deter-ministic dynamical system:

xi(n+1) = xi(n)+(

ηi

η−1{Mn=i}

)(1−βMn)∨N

j=1 {x j(n)}

where Mn ∈ argmax jx j(n). Throughout, we use the operatorsx∨ y := max{x,y} and x∧ y := min{x,y}, for x,y ∈ R.

This can be further rewritten as follows. Let x(n) be thevector x(n) with coordinates ordered in decreasing order, forany given n. Thus, x1(n) ≥ x2(n) ≥ ·· · ≥ xN(n), for all n. Wecan write:

xi(n+1) ={

aix1(n), Mn = ixi(n)+bix1(n), Mn �= i,

(7)

where ai := βi +ηi(1−βi) and bi := ηi(1−βi), i = 1,2, . . . ,N.In the remainder of this section, we confine to homogeneous

flows, thus ai = a1 and bi = b1, for all i = 2,3, . . . ,N. Withoutloss of generality, we assume c = 1. Thus, for any n, x(n)takes value on a N−1 dimensional simplex SN−1 = {x ∈ R

N+ :

∑Ni=1 xi = 1} and x(n) on the subset of SN−1, S0

N−1 = {x ∈RN+ :

∑Ni=1 xi = 1, xi ≥ x j, i ≤ j}.

Lemma 1 (representation): Consider a homogeneous popu-lation of N AIMD users competing for a link under thelargest-rate loss policy. The dynamics of the ordered vectorof rates x(n) is given by the recurrence

x(n+1) = g(x(n)), (8)

where S0N−1 �−→ S0

N−1

x → g(x)

with g(·) = (g1(·),g2(·), . . . ,gN(·)), and

gi(x) =

a1x1 ∨ (x2 +b1x1), i = 1(a1x1 ∨ (xi+1 +b1x1))∧ (xi +b1x1), elsea1x1 ∧ (xN +b1x1), i = N.

This lemma is proved in Appendix C.As an aside technical remark, note that the mapping x →

g(x) involves min, max, and plus operators, but is not a

Page 8: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

00

0.5

1

Fig. 5. Sample-paths of the rates of two AIMD flows ”orchestrated” by thebeatdown policy.

min-max mapping (see Gunawardena [13] and the referencestherein). In particular, the mapping is not homogeneous, i.e.g(x+h) �= g(x)+h, for all h ∈ R, a property enjoyed by min-max mappings.

Proposition 1: The map g : SN−1 → SN−1 of the recurrence(8) has a unique fixed point x0 in the interior of SN−1 (allcoordinates strictly positive) given by

x0i =

{ 2(1+β1)N+(1−β1) i = 1

x01

(1− 1−β1

N (i−1))

i = 2,3, . . . ,N.(9)

The proof is deferred to Appendix D.Remark 1: The fixed point uniquely identifies a N-periodic

sequence x0(n) in the simplex SN−1. We readily construct acontinuous-time function x0(t) with values in SN−1 associatedto the sequence x0(n) by the definition of the inter-congestiontimes (4), with T0 = 0. As customary, a time-stationary sample-path is constructed by shifting the T -periodic function x0(t)for a time uniformly at random on the interval [0,T ].

Remark 2: The proposition says that the stationary pointof the rates just before link congestion event is such that thatconnections are signalled a congestion indication in a round-robin fashion.

B. Beatdown

It is particular to the beatdown loss policy that congestiontimes accumulate, i.e. the number of congestion events onsome finite time intervals is infinite (see Figure 5). This impliesthat IE0(T1) = 0. This prevents us from applying Theorem 1.In this section we circumvent this problem by looking atsome special link congestion times, and show that indeed thethroughput-insensitivity property holds.

Define τn as a link congestion time at which a connectionbecomes tagged according to the beatdown policy. Let yi(n)be the rate of a flow i at time τn (subsequence of xi(n)). Withan abuse of notation, let zi(n) indicates whether a flow i wasselected at time τn. The rates embedded at the selected linkcongestion times evolve as follows:

y(n+1) = (1− zi(n))(yi(n)+ηi(τn+1 − τn)). (10)

Indeed, if a flow i is tagged at τn, i.e. zi(n) = 1, then yi(n+1) =0. Else, zi(n) = 0, and thus the flow continues increasing itsrate with rate ηi from the initial rate yi(n).

Summing ∑Nj=1 y j(n) = c, yields

τn+1 − τn =∑N

j=1 z j(n)y j(n)

∑Nj=1(1− z j(n))η j

. (11)

Finally, we have:

yi(n+1) = (1− zi(n))(

yi(n)+ηi

η−ηMn

yMn(n))

,

where, recall, Mn = i for i such that zi(n) = 1.We have the following result whose proof is in Appendix E.

Theorem 3: For a finite and homogeneous population ofAIMD connections, any time-stationary beatdown loss pol-icy is throughput-insensitive.

In the sequel of this section, we give the stationary point forthe flow rates at selected congestion times, for a homogeneouspopulation of AIMD connections. Let y(n) be the vector ofthe rates y(n) sorted in decreasing order. The dynamics is asfollows:

yi(n+1) ={

0 Mn = iyi(n)+ 1

N−1 y1(n) otherwise.

The stationary point satisfies the following identities: yk =yk+1 + 1

N−1 y1, k = 1,2, . . . ,N −1. It follows

yk =N − kN −1

y1.

Now, ∑Ni=1 yi = c, from which it follows y1 = 2c/N. In sum-

mary,

yi

c=

2N i = 1

1N−1

(1− i

N

)i = 2,3, . . . ,N −1

0 i = N.

V. EXPERIMENTAL RESULTS

The results in earlier sections are exact for AIMD connec-tions and the prevailing assumptions therein. In this section, weprovide experimental validation of our analysis by simulationsand internet measurements. Simulation and measurementsscripts and data are available on-line at [1].

We performed ns-2 simulations as this allows us to easilyconfigure queueing disciplines in order to conduct controlledexperiments. Our internet measurements provide validation inreal-world scenarios.

A. Setup of ns-2 experiments

We describe first our ns-2 experiments. We consider asystem of N identical TCP-SACK connections that traversea bottleneck as showed in Figure 6. The bottleneck link im-plements either DropTail or RED queueing discipline. Payloadof a TCP packet is set to 1000 bytes. Maximum window size isset to 20000 packets. Each simulation is run for 2000 secondsand initial 1000 seconds are truncated to reduce the effectof initial transient. Start times of connections are randomizedby uniformly drawing the starting times within interval [0,4]seconds.

Page 9: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

1

2

N

...

10 Mb/s, 5 ms

...

TCP sink

1

2

NDropTail or RED

10 Mb/s, 5 ms

c

TCP source

r1 r2

Fig. 6. ns-2 network configuration.

0 2 4 6 8 10 12 14 160.75

0.8

0.85

0.9

0.95

1

Number of connections, N

Th

rou

gh

pu

t /

cap

aci

ty

b=1

2

3

4

5

10

15

20

25

Fig. 7. Time-average send rate normalized with link capacity versus thenumber of TCP connections. Results are obtained in ns-2 for DropTail withvarying buffer length b as indicated in the figure.

In some simulations we use RED bottleneck. Originaldesign goal of RED was to mitigate TCP window synchro-nization [12]. RED parameters are set as follows. Minimumand maximum threshold sizes are set to 0.6 and 0.8 factor ofthe buffer length. The RED packet drop probability functionis chosen with the value equal to 0.1 at the maximum bufferthreshold. The queue size averaging factor is set to 0.001.The RED option wait was enabled; this would enforce somewaiting between successive packet drops.

Theorem 1 suggests the following claim: (H) a system of Nidentical TCP connections that compete for a bottleneck witha small buffer attains link utilization 1−1/(1+3N).

B. DropTail

The results are shown in Figure 7. We observe that typically,for small buffer lengths, the achieved throughput is smallerthan predicted by the analysis. Moreover, we note that thelarger the buffer is, the larger the throughput is. If bufferlength is sufficiently large, throughput is larger than predictedby the analysis. [TBD: discuss bandwidth-delay product!] Weverified by inspection that in many cases in Figure 7, TCP

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

10

20

30

40

50

60

70

time (sec)

Sum

of all

Wind

ows (

pack

ets)

b = 5

b = 15b = 10

b = 20

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

10

20

30

40

50

60

70

time (sec)

Sum

of all

Wind

ows (

pack

ets)

b = 5b = 10

b = 15

b = 20

Fig. 8. Sum of congestion windows over all TCP connections for (top) N = 5and (bottom) N = 10, competing for bottleneck link with propagation round-trip time 120 ms and buffer lengths as indicated in the figure. The resultsclearly indicate synchronization of TCP connections.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

2

4

6

8

10

time (sec)

Win

do

ws (

pa

cke

ts)

Fig. 9. Synchronization in DropTail bottleneck with N = 10 and b = 5.

windows are synchronized. This is exhibited for N = 5 and 10in Figure 8. The figure shows sum of congestion windowsover all connections versus time. The sum of congestionwindows essentially evolves over time as congestion windowof a single TCP connection, which indicates synchronization.This is further reflected in Figure 9 that shows congestionwindows of individual connections.

C. RED

We show results in Figure 10. The results exhibit reasonableconformance to Theorem 1.

D. Loss polices that conform to ONE

We observed TCP window synchronization in experimentswith DropTail. In order to validate Theorem 1, our aim isto design experiments in which assumption ONE holds, i.e. at

Page 10: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

0 5 10 150.75

0.8

0.85

0.9

0.95

1

N

Th

rou

gh

pu

t/C

ap

aci

ty

b=16b=17b=18

Fig. 10. Time-average send rate normalized with link capacity versus thenumber of TCP connections. Results are obtained in ns-2 for RED withvarying buffer length as indicated in the figure.

each congestion event exactly one connection is signalled con-gestion. To that end, we implement a loss-policy—threshold-dropper—described as follows. The queue buffer length is setto virtually infinite value. The dropping is based on a finitepositive threshold Th and loss policy described as follows.Consider the system from a time at which there are at most Th

queued packets. The first packet arrival that exceeds thresholdis dropped and the corresponding connection is declared se-lected. The selected connection remains selected for an intervalof τ time units, where τ is set to a factor of the mean round-triptime. This process repeats. No packets are dropped during thisperiod. It is clear from the description that threshold-dropperenforces assumption ONE. It also permits implementation ofvarious loss polices.

Figure 11–left shows throughput achieved with threshold-dropper that implements rate-independent loss policy. Wenote a good qualitative conformance with Theorem 1, inparticular when the number of connections is not too large.For sufficiently large number of connections, the throughputis essentially equal to the link capacity. In these case, thequeue does not deplete, which explains larger throughput thanpredicted by the analysis. Analogous results for largest-rateloss policy are showed in Figure 11–right. The experimentalresults obtained for rate-independent and largest-rate policymatch well, which indicates throughput insensitivity. This isin conformance to the analysis.

We further provide results that indicate threshold-droppermitigates TCP window synchronization. Figure 12–top showsthe congestion window evolution of 10 individual TCP connec-tions. This is substantiated by looking at the sum of congestionwindows in Figure 12–bottom. The alert reader will comparethis with Figure 8.

1) What loss polices emerge with DropTail and RED?: Itis often assumed that a connection observes loss events with

480 485 490 4950

5

10

15

20

25

time (sec)

Wind

ows (

pack

ets)

480 485 490 49550

55

60

65

70

75

80

85

90

95

time (sec)

Sum

of all

Wind

ows (

pack

ets)

Fig. 12. (Top) Congestion windows of N = 10 TCP connections, (bottom)sum of all congestion windows. The loss policy is threshold-dropper withthreshold Th = 3. The results suggest absence of synchronization.

intensity in time proportional to send rate. This correspondsto rate-proportional loss policy introduced earlier. We inves-tigate over a limited set of experiments whether or not rate-proportional loss policy emerges with DropTail and RED. Tothat end, we proceed with the following test. Time is binnedinto consecutive intervals of length T time units. A connectionis assumed to receive a congestion indication in a bin n, if thereis at least one packet drop for this connection in the nth bin.The window size of the connection i in bin n is denoted bywi(n). Our goal is to estimate IP(zi(n) = 1|wi(n) = w), wherezi(n) = 1 if the connection i is indicated congestion in bin n,else zi(n) = 0. We estimate this conditional probability by theempirical mean

z(w) :=∑n ∑N

i=1 1{zi(n)=1,wi(n)=w}∑n ∑N

i=1 1{wi(n)=w}

which under stationary and ergodicity assumptions convergeswith probability 1 to IP(zi(n) = 1|wi(n) = w), as number oftime bins tends to infinity. In Figure 13, we show the empiricalmean z(w) versus w for systems of N = 10 TCP connectionswith propagation round-trip time of 120 ms and DropTail linkfor buffer sizes = 5, 10, 15, 20 packets. Same is showedin Figure 14, but for RED. Some of the results suggest thelarger the congestion window of a connection just before alink loss event is, the more likely the connection is signalleda congestion indication.

Page 11: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

0 5 10 150.75

0.8

0.85

0.9

0.95

1

N

Th

rou

gh

pu

t/C

ap

aci

ty

Th=3,b=100

Th=2,b=100

Th=3,b=30

Theorem 1

0 5 10 150.75

0.8

0.85

0.9

0.95

1

N

Th

rou

gh

pu

t/C

ap

aci

ty

Th=3,b=100

Th=2,b=100

Th=3,b=30

Theorem 1

Fig. 11. The ratio of the throughput and link capacity versus the number of connections N with threshold-dropper: (left) rate-independent, (right) largest-rate.The results conform well with Theorem 1 (solid line).

0 10 20 30 400

0.5

1

1.5

2

w

z(w

)

b=5

0 10 20 30 400

0.5

1

1.5

2

w

z(w

)

b=10

0 10 20 30 400

0.5

1

1.5

2

w

z(w

)

b=15

0 10 20 30 400

0.5

1

1.5

2

w

z(w

)

b=20

Fig. 13. Empirical estimate of IP(zi(n)|wi(n) = w) versus the window sizew for DropTail link and 10 TCP connections, for buffer length as indicatedin the graphs.

E. Internet measurements

We now proceed with more realistic experiments by mea-surements on the internet planet-scale testbed, PlanetLab [2].We choose randomly a set of distinct end-to-ends path fromdifferent continents such as Europe and North America. Foreach path, we conduct 10 repeated experiments. Each setof measurement starts with parallel TCP transfers with upto 50 parallel TCP connections. We measure the aggregatethroughput obtained using the tool iperf [3] and round-triptime using ping. Some basic information about end-to-endpaths is showed in Table I.

The measured throughputs are showed in Figure 15 and 16.In Figure 15, we note a very good match of measurementsinria-stanford with Theorem 1. The measurementsfortiche-titeuf suggests buffer saturation. All results

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

w

z(w

)b=5

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

w

z(w

)

b=10

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

w

z(w

)

b=15

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

w

z(w

)

b=20

Fig. 14. Same as in Figure 13, but for RED.

exhibit qualitative conformance with the throughput increasewith the number of connections predicted by the analysis.

We next investigate whether connections are synchronized.On distinct end-to-end paths, we run iperf tool with 10parallel TCP connections. We collect packets transmitted inboth direction of a connection with tcpdump and then inferloss events. We clump loss events to congestion events basedon a time threshold (a factor of the round-trip time) and countthe number of distinct connections, Mn that suffer at least oneloss event in a bin n. We plot the empirical complementarydistribution function of Mn in Figure 17. We observe that Mn

is a small fraction of N = 10 on almost all paths. Furthermore,we plot loss events over time in Figure 18-19 and observe thatlosses do not appear being clustered together, which suggestsabsence or weak TCP window synchronization.

Page 12: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

TABLE I

SET OF END-TO-END PATHS USED IN MEASUREMENTS

path Source host Destination Host RTT

umass-berkeley planetlab1.cs.umass.edu planetlab2.millenium.berkeley.edu 97msinria-stanford planetlab3.inria.fr planetlab-2.stanford.edu 194.7mscornell-greece planetlab4-dsl.cs.cornell.edu planet2.ics.forth.gr 337.9ms

LAN einat.inria.fr titeuf.inria.fr 1.6 ms

1 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Th

rou

gh

pu

t /

Ca

pa

city

N1 5 10 15 20 25 30 35 40 45 50

0

0.2

0.4

0.6

0.8

1

Th

rou

gh

pu

t /

Ca

pa

city

N

Fig. 15. Throughput measured on the internet with iperf with N parallel TCP connections. For each N, 10 distinct measurements were conducted. Thefigures show boxplots that indicate the measurement samples are concentrated around corresponding average values. Solid line is the result of Theorem 1interpolated by fitting the measured value for N = 50 with the corresponding analytical value. The measurement results exhibit qualitative growth of thethroughput with N as predicted by the analysis. (left) umass-berkeley (right) inria-stanford. The latter shows particularly good conformance withTheorem 1.

1 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Th

rou

gh

pu

t /

Ca

pa

city

N1 5 10 15 20 25 30 35 40 45 50

0

0.2

0.4

0.6

0.8

1

Th

rou

gh

pu

t /

Ca

pa

city

N

Fig. 16. Same as in Figure 15, but (left) cornell-greece (right) fortiche-titeuf.

VI. CONCLUSION

We obtained a simple and new throughput formula forparallel adaptive connections that adapt their send rates ac-cording to widely-known additive-increase and multiplicativedecrease. The connections compete for a link of fixed capacityand link congestion times are defined by instants when theaggregate send rate hits the link capacity. The result is underassumption that at each link congestion event, exactly oneconnection undergoes a multiplicative decrease. This result

albeit being very simple in its final form is obtained by anon-trivial proof. Its practical implications are versatile. Itsuggests an explicit dependence of the throughput gain onthe number of parallel TCP sockets. In particular, the resultpredicts that 90% utilization would already be achieved byas few as 3 sockets. It thus provides incentives to limit thenumber of parallel TCP sockets per data transfers, as beyonda relatively small number of sockets, any throughput gainbecomes marginal. The result also reveals a somewhat counter-

Page 13: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

54

55

56

57

58

59

0 50 100 150 200 250 300

Co

nn

ectio

n ID

time (sec)

Loss Process, RTT=97ms

4

6

8

10

12

14

16

18

20

551 551.5 552 552.5 553 553.5 554

Co

nn

ectio

n ID

time (sec)

Loss Process, RTT=195ms

Fig. 18. Loss events: connection identifiers suffering loss are shown versus time where a connection identifier are least two significant digits of port number.(left) umass-berkeley (right) inria-stanford.

52

54

56

58

60

62

64

66

68

70

72

0 50 100 150 200 250 300

Co

nn

ectio

n I

D

time (sec)

Loss Process, RTT=338ms

45

46

47

48

49

50

51

52

53

54

0 50 100 150 200 250 300

Co

nn

ectio

n I

D

time (sec)

Loss Process, RTT=1.6ms

Fig. 19. Same as in Figure 18, but (left) cornell-greece (right) einat-titeuf.

intuitive throughput insensitivity property: no matter what losspolicy that assigns congestion indications to connections isover a broad set of loss policies that select one connection perlink congestion time, and a mild stability condition holds, themean aggregate throughput remains the same. This is showednot to hold for the steady-state distribution of the aggregatesend rate. The throughput formula is validated by simulationsand internet measurements.

ACKNOWLEDGMENTS

E. A and B. T were in part supported by EuroNGI NoEand INRIA TCP cooperative research action. M. V. is gratefulto Laurent Massoulie and Don Towsley for discussions ofsome technical points. He also thanks Dinan Gunawardenafor his valuable suggestions on practical implications of thethroughput-insensitivity found in this paper. Jon Crowcroft andKarl Jeacle pointed to the Grid application.

REFERENCES

[1] http://research.microsoft.com/˜milanv/socks.htm.[2] http://www.planet-lab.org.

[3] http://dast.nlanr.net/Projects/Iperf/.[4] E. Altman, K. Avrachenkov, and C. Barakat. A stochastic model of

tcp/ip with stationary random losses. In Proc. of the Sigcomm’00, pages231–242, 2000.

[5] E. Altman, R. El Azouzi, D. Ros, and B. Tuffin. Loss Strategies forCompeting TCP/IP Connections. In Proc. of the Networking 2004,Athens, Greece, May 2004.

[6] E. Altman, F. Boccara, J. Bolot, P. Nain, P. Brown, D. Collange, andC. Fenzy. Analysis of the tcp/ip flow control mechanism in high-speedwide-area networks. In 34th IEEE Conference on Decision and Control,pages 368–373, New Orleans, Louisiana, Dec 1995.

[7] E. Altman, T. Jimenez, and R. Nunez-Queija. Analysis of two competingtcp/ip connections. Performance Evaluation, 49(1):43–56, 2002.

[8] G. Appenzeller, I. Keslassy, and N. McKeown. Sizing Router Buffers. InProc. of ACM Sigcomm 2004, Portland, Oregon, USA, Aug/Sept 2004.

[9] F. Baccelli and D. Hong. AIMD, Fairness and Fractal Scaling ofTCP Traffic. In Proc. of IEEE Infocom 2002, New York, NY,http://www.di.ens.fr/∼trec/aimd, 2002.

[10] J. Crowcroft and P. Oechslin. Differentiated End-to-End Internet Ser-vices using a Weighted Proportional Fair Sharing TCP. ACM ComputerCommunications Review, 47(4):275–303, 2004.

[11] S. Floyd and K. Fall. Promoting the use of end-to-end congestion controlin the Internet. IEEE/ACM Trans. on Networking, 7(4):458–472, August1999.

[12] S. Floyd and Van Jacobson. Random early detection gateways for

Page 14: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of connections,m

P(M

n>m

)Cornel−greeceInria−stanfordUmass−berkeleyLAN

Fig. 17. Complementary distribution function of the number of connectionassigned a packet drop in an interval length of const x RTT for 10 parallelTCP connections.

congestion avoidance. IEEE/ACM Trans. on Networking, 1(4):397–413,August 1993.

[13] J. Gunawardena. From max-plus algera to nonexpansive mappings:a nonlinear theory for discrete event systems. Theoretical ComputerScience, 293:141–167, 2003.

[14] T. Hacker, B. Athey, and B. Noble. The end-to-end performance effectsof parallel TCP sockets on a lossy wide-area network. In 16th IEEE-CS/ACM International Parallel and Distributed Processing Symposium,Ft. Lauderdale, FL, April 2002.

[15] Van Jacobson and M.J. Karels. Congestion avoidance and control. InProc. of the ACM SIGCOMM’88, pages 314–329, Stanford, August1988.

[16] T. Kelly. Scalable TCP Improving Performance in HighSpeed WideArea Networks. In PFLDNet’03, Geneve, Switzerland, February 2003.

[17] R. El Khoury and E. Altman. Analysis of Scalable TCP. In In Proc. ofHet-Nets’04, West Yorkshire, United Kingdom, July 2004.

[18] J. Mahdavi and S. Floyd. TCP-Friendly Unicast Rate-Based FlowControl. Technical note sent to end-2-end interest mailing list,http://www.psc.edu/networking/papers/tcp friendly.html, January 1997.

[19] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP RenoPerformance: A Simple Model and its Empirical Validation. IEEE/ACMTrans. on Networking, 8(2):133–145, 2000.

APPENDIX

A. Proof of Theorem 1

We first exhibit a few auxiliary results and then provide theproof the theorem. Let us begin with a characterization lemma:

Lemma 2: Let ϕ(·) be a positive-valued function, and ψ(·)be a primitive function of ϕ(·), that is ψ′(·) = ϕ(·). Theaggregate send rate X is characterized as

IE(ϕ(X(0))) =ψ(c)− IE0(ψ(c−ηT1))

ηIE0(T1). (12)

Proof. By Palm inversion formula,

IE(ϕ(X(0))) =IE0(∫ T1

0 ϕ(X(s))ds)

IE0(T1).

We have X(t) = c−η(Tn+1 − t), Tn ≤ t < Tn+1. Combiningthe last two displays, we have

IE(ϕ(X(0))) =IE0(∫ T1

0 ϕ(c−η(T1 − s))ds)

IE0(T1).

Now, use the change of variable c−η(T1 − s) = v and rewritethe last display as

IE(ϕ(X(0))) =IE0(∫ c

c−ηT1ϕ(v)dv

)ηIE0(T1)

.

The proof follows by the definition of ψ(·). �

The next corollary follows immediately from the lemma bydefining ϕ(x) = xm and invoking the binomial theorem.

Corollary 1: The moments of the aggregate rate are charac-terized as

IE(X(0)m) = cm − ∑m+1k=2

(m+1k

)(−1)kcm+1−kηkIE0(T k

1 )(m+1)ηIE0(T1)

.

As a remark, note that the m+1th time-stationary momentof the aggregate send rate is entirely determined by the event-stationary moments of the inter link congestion times of theorders 1 to m + 1. The following easy observation shall beuseful in the sequel.

Lemma 3: Under a loss policy that verifies (ONE), the interlink congestion times satisfy, for any m,

Smn =

1ηm

N

∑i=1

(1−βi)mzi(n)xi(n)m, all n. (13)

Proof. The asserted equality follows directly by summing bothsides of the equation (3) over i and invoking the definition ofthe link congestion times. �

The remainder of this section is the proof of Theorem 1.Recall the short-cut xi(n) := Xi(Tn) and that IE0(T m

1 ) = IE(Smn ).

From Corollary 1, we obtain

IE(X(0)) = c− η2

IE0(T 21 )

IE0(T1). (14)

The remainder of the proof shows that

IE0(T 21 )

IE0(T1)= K

where K is a constant that depends on the parameters(η1,β1,N). From (13), it follows that under the homogeneityassumption, we have

IE0(T 21 )

IE0(T1)=

1−β1

η1N∑N

i=1 IE0(zi(0)xi(0)2)∑N

i=1 IE0(zi(0)xi(0)). (15)

Page 15: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

Now, rise to the power 2 both sides of the relation (3)using (13). After a few simplifications, we obtain

xi(n+1)2 = xi(n)2 −(

1−β2i +2

ηi

η(1−βi)2

)(zi(n)xi(n))2

+2ηi

ηxi(n)

N

∑j=1

(1−β j)z j(n)x j(n)

+(

ηi

η

)2 N

∑j=1

(1−β j)2(z j(n)x j(n))2.

To arrive to the last display we use the property zi(n)z j(n) = 0,whenever i �= j, implied by the ONE assumption. Next, wesum both sides of the last display over i and take expectation,to obtain

N

∑i=1

(1+βi

1−βi+

2ηi

η−

N

∑j=1

(η j

η

)2)

(1−βi)2IE0 ((zi(0)xi(0))2)

= 2IE0

(N

∑i=1

ηi

ηxi(0)

N

∑j=1

(1−β j)z j(0)x j(0)

).

For identical AIMD parameters, the last relation boils downto

∑Ni=1 IE0((zi(0)xi(0))2)∑N

i=1 IE0(zi(0)xi(0))=

2c(1+β1)N +(1−β1)

.

It remains only to plug-in the last identity into (15) and theninto (14) to retrieve the asserted identity of the theorem.

B. Proof of Theorem 2

Lemma 4: The time-stationary moments for two AIMDflows i = 1 and 2 that compete for a link of a capacity care given by:

IE(X1(0)m)cm =

c

(m+1)η1IE0(T1)

×(

am+2IE0(z1(0)x1(0)m+1)

cm+1 − IE0((1− z1(0))x1(0)m+1)cm+1

+m+1

∑k=0

akIE0((1− z1(0))x1(0)k)

ck

)

where for k = 0,1, . . . ,m+1,

ak =(

m+1m+1− k

)(η1

η(1−β2)

)m+1−k(1− η1

η(1−β2)

)k

am+2 =(

β1 +η1

η(1−β1)

)m+1

−βm+11

and

IE0(T1) =1η

[(2−β1 −β2)IE0(z1(0)x1(0))

+c(1−β2)(1− IE0(z1(0)))(1−β2)IE0(x1(0))].

Proof. The proof follows from Palm inversion formula, aftersome tedious computations. By the Palm inversion formula,

we have

IE(X1(0)m) =1

IE0(T1)IE0(∫ T1

0X1(s)mds

). (16)

We continue with computing the integral, given(xi(0),zi(0), i = 1,2, . . . ,N),∫ T1

0X1(s)mds =

∫ T1

0[(1− (1−β1)z1(0))x1(0)+η1s]mds

=1η1

∫ (1−(1−β1)z1(0))x1(0)+η1S0

(1−(1−β1)z1(0))x1(0)umdu

=1

(m+1)η1

([(1− (1−β1)z1(0))x1(0)

+η1S0]m+1 − (1− (1−βm+11 )z1(0))x1(0)m+1

)=

1(m+1)η1

(m+1

∑j=0

(m+1

j

)[1− (1−βm+1− j

1 )z1(0)]

η j1x1(0)m+1− jS j

0 − (1− (1−βm+11 )z1(0))x1(0)m+1

).

Note that the equality ((1− (1− β1)z1(0))x1(0))m+1 = (1−(1−βm+1

1 )z1(0))x1(0)m+1 comes from the fact that z1(0) is 0or 1.

It follows

(m+1)η1

∫ T1

0X1(s)mds (17)

=m+1

∑j=1

(m+1

j

)[1− (1−βm+1− j

1 )z1(0)]η j1x1(0)m+1− jS j

0.

By (13) and zi(n)z j(n) = 0 whenever i �= j, for all n, we canwrite

[1− (1−βm+1− j1 )z1(0)]S j

0

=1η

N

∑k=1

[1− (1−βm+1− j1 )z1(0)](1−βk) jzk(0)xk(0) j

=1η

[βm+1− j

1 (1−β1) jz1(0)x1(0) j + ∑k �=1

(1−βk) jzk(0)xk(0) j].

Plug-in the last identity in (17) to obtain

(m+1)η1

∫ T1

0X1(s)mds

=m+1

∑j=1

(m+1

j

)(η1

η

) j

(1−β1) jβm+1− j1 z1(0)x1(0)m+1 +

+ ∑k �=1

m+1

∑j=1

(m+1

j

)(η1

η

) j

(1−βk) jzk(0)x1(0)m+1− jxk(0) j.

The first summation element on the right-hand side of theequality simplifies immediately, so that we have

(m+1)η1

∫ T1

0X1(s)mds

=

[(β1 +

η1

η(1−β1)

)m+1

−βm+11

]z1(0)x1(0)m+1 +

+ ∑k �=1

m+1

∑j=1

(m+1

j

)(η1

η

) j

(1−βk) jzk(0)x1(0)m+1− jxk(0) j.

Page 16: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

Denote with sN the second summation element in the right-hand side of the last equality. We now specialize to N = 2 andproceed with computing s2. We thus have

(m+1)η1∫ T1

0 X1(s)mds

=

[(β1 +

η1

η(1−β1)

)m+1

−βm+11

]z1(0)x1(0)m+1 + s2 (18)

with s2 given by

m+1

∑j=1

(m+1

j

)(η1

η(1−β2)

) j

(1−z1(0))x1(0)m+1− j(c−x1(0)) j.

Note that

(1− z1(0))x1(0)m+1− j(c− x1(0)) j

=j

∑k=0

(jk

)(−1) j−kck(1− z1(0))x1(0)m+1−k

= (−1) j(1− z1(0))x1(0)m+1

+j

∑k=1

(jk

)(−1) j−kck(1− z1(0))x1(0)m+1−k.

Substituting the last identity into s2, we have

s2 =m+1

∑j=1

(m+1

j

)(η1

η(1−β2)

) j

×[(−1) j(1− z1(0))x1(0)m+1 +

j

∑k=1

(jk

)(−1) j−kck(1− z1(0))x1(0)m+1−k

].

Computing the sum for the first element in brackets, we have

s2 =[(

1− η1η (1−β2)

)m+1 −1

](1− z1(0))x1(0)m+1 + s′2

(19)with

s′2 = cm+1m+1

∑j=1

(m+1

j

)(η1

η(1−β2)

) j

×j

∑k=1

(jk

)(−1) j−k(1− z1(0))

x1(0)m+1−k

cm+1−k .

We can write s′2 = cm+1 ∑m+1j=1 ∑m

k=m+1− j d j,k(1− z1(0)) x1(0)k

ck ,where

d j,k =(m+1

j

)(j

m+1− k

)(η1

η(1−β2)

) j

(−1)− j+m+1−k =

(m+1

m+1− k

)(k

j−m−1+ k

)(η1

η(1−β2)

) j

(−1)− j+m+1−k

Then,

s′2 = cm+1m

∑k=0

ak(1− z1(0))x1(0)k

ck (20)

with

ak =m+1

∑j=m+1−k

d j,k, k = 0,1, . . . ,m.

The coefficients ak admit a simple form: ak =(

m+1m+1− k

) m+1

∑j=m+1−k

(k

j−m−1+ k

)(η1

η(1−β2)

) j

(−1)− j+m+1−k =

(m+1

m+1− k

) k

∑i=0

(ki

)(η1

η(1−β2)

)i+(m+1−k)

(−1)i =

(m+1

m+1− k

)(η1

η(1−β2)

)m+1−k(1− η1

η(1−β2)

)k

.

This recovers the asserted definition of the coefficients ak.It remains to see that the asserted identity for the momentsis true. To that end, plug-in (20) into (19) and the result into(18). Define am+2 as asserted. Take expectation on the identityobtained for

∫ T10 X1(s)mds. IE0(T1) follows readily by taking

expectation in (13). The result follows by (16). �

Proof of Theorem 2Take m = 2 in the identity for IE(X1(0)m) given by

Lemma 4. The setting is for symmetric TCP flows, we thus cantake η1 = η2 = 1 and β1 = β2 = 1/2. By direct computation

(a1,a2,a3,a4) =(

164

,964

,3764

,5664

).

We now proceed in four steps, each step showing the respec-tive asserted results for the beatdown, rate-independent, rate-proportional, and largest-rate policy. We use the simplifyingnotation xk = IE0(x1(0)k)/ck. Our two flows are with identicalAIMD parameters, hence xi = 1/2 and IE0(zi(0)) = 1/2, i =1,2.

Step 1 (Beatdown) A stationary sample-path is a uniform-phase shift of the periodic solution in Figure 4. Considering1 as the tagged flow, under τ0 = 0, x1(0) = 1, x2(0) = 0, andplugging these values in (3) and (4), we have x1(n) = 3

4x1(n−1) =

(34

)nand Sn = 1

2η(

34

)n, n = 1,2, . . . (see also Step 1. of

the proof of Theorem 3 for further details). It follows that thesurface under the curve of the square throughput is∫ τ1

0X1(s)2ds

=∞

∑n=0

13

(x1(n+1)3 − x1(n)3

x1(n+1)− x1(n)Sn

)

=1

∑n=0

((34

)2n+2

+(

34

)2n+1(34

)2n)(

34

)n

=2

3η.

Note that, τ1 = ∑∞n=0 Sn = 2c

η , which can also be directlyobtained as the time for the other flow to increase its rate from0 to c. If flow 1 is not the one to experience the loss duringthe period, we similarly compute

∫ τ10 X(s)2ds = c2

3 τ = 2c3

3η .We have that either of the last two cases occur with Palmprobability 1/2, hence by Palm inversion formula, we getIE(x1(0)2)

c2 = 13 , as asserted.

Step 2 (rate independent). A flow is selected inde-pendently of the rate process, hence IE0(z1(0)x1(0)m) =

Page 17: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

IE0(z1(0))IE0(x1(0)m), for any real m. Under the prevailingassumptions, the equation in Lemma 4 reads as

IE(X1(0)2)c2 =

23(a4x3 −2a3x3 +a2x2 +a1x1 +a0)

with

(x1,x2,x3) =(

12,

27,

528

).

Indeed, IE(X1(0))c = 1

2 . Also, from the square of the equation(3), using (13) and the symmetry, we obtain

IE0(x1(0)2)

= IE0

(((1− 1

2z1(0))X1 +

z1(0)x1(0)+(1− z1(0))(c−x1(0))4

)2)

=116

(9IE0(x1(0)2 +

12

c2 +3cIE0(x1(0)))

,

yielding the asserted value for x2. Also, using the symmetry,

IE0(x1(0)3) = IE0((c− x1(0))3)= c3 −3c2IE0(x1(0))+3cIE0(x1(0)2)− IE0(x1(0)3),

that results in the asserted value for x3, given x1 and x2.Step 3 (rate-proportional) It is readily checked that for an

arbitrary positive integer m, for the prevailing rate-proportionalloss policy, the expression in Lemma 4 reads as (a−1 := 0)

IE(X(0)m)cm =

am+2xm+2 − (am+1 +am)xm+1 +∑mk=0(ak −ak−1)xk

(m+1)η1IE(T1)

since IE0(z1(0)x1(0)m) = IE0(x1(0)m+1)c . From [7], we have

(x1,x2,x3,x4) =(

12,

726

,2

13,

6797358

).

Plugging these values yields the asserted result.Step 4 (largest-rate). From Proposition 1, a stationary

sample-path is a uniform-phase shift of a 2-periodic func-tion with values 2

7 c and 37 c. It follows IE0(T1) = c/7 and∫ T2

0 X(s)2ds =∫ 2c/7

0 (x+2c/7)2dx = 8147 c3. Dividing this value

by 2IE0(T1), we obtain IE(X (1)(0)2) = 421 c2.

C. Proof of Lemma 1

Consider (7) with ai = a1 and bi = b1. We have,

x1(n+1) = a1x1(n)∨ (∨ j �=Mn{x j(n)}+b1x1(n))

= a1x1(n)∨ (x2(n)+b1x1(n)) .

This shows g1(·). By the same argument, but performing theminimum over the right-hand sides in (7), gN(·) follows. Now,the expression for i = 2,3, . . . ,N, is merely inserting a1x1(n)into the ordered sequence x j(n)+ b1x1(n), j = 2,3, . . . ,K, sothat the resulting sequence is sorted in a non-increasing order.

D. Proof of Proposition 1

The map g : SN−1 → SN−1 is continuous and SN−1 is asubset of {x : R

N : ∑Ni=1 x2

i ≤ 1}, so that existence follows fromBrouwer fixed point theorem. We next show that x0 givenby (9) is a fixed point of (8). Let x0 ∈ SN−1 be such thatx0 = g(x0). Let a1x0

1 < xi+1 + b1x01 for all i = 1,2, . . . ,N. It

follows that

x0i = x0

i+1 +b1x01, i = 1,2, . . . ,N −1

x0N = a1x0

1.

The last identities are equivalent to saying

x0i = (1− (i−1)b1)x0

1, i = 1,2, . . . ,N −1,

x0N = a1x0

1.

The result follows from ∑Ni=1 x0

i = 1, and the definitions of a1

and b1.We next show that x0 as defined above is a unique fixed

point in the interior of SN−1. Recall that x0 is such that a1x01 <

x0j + b1x0

1, for all j = 1,2, . . . ,N. To contradict, suppose y0 isa fixed point of (8) in the interior of SN−1 such that for somei ∈ {1,2, . . . ,N −1}, a1y0

1 ≥ y0i + b1y0

1. From the definition ofg(·), it follows that y0

i ≥ y0i +b1y0

1, thus, y01 ≤ 0, a contradiction.

This completes the proof.Remark 3: The Jacobian J(x0) of the map g(·) at the point

x0 is equal to (J(x0))i,1 = b1, i = 1,2, . . . ,N −1, (J(x0))N,1 =a1, (J(x0))i,i+1 = 1, i = 1,2, . . . ,N − 1, with other elementsequal to 0. Note that the Jacobian J(x0) is a companion matrix,so that its eigenvalues are the solutions of the characteristicpolynomial λN−1 −b1λN−2 −·· ·−b1λ−a1 = 0, which can berewritten as

λN−1 = η1(1−β1)(λN−2 +λN−3 + · · ·+λ +1)+β1.

It is readily checked that λ = 1 is an eigenvalue.

E. Proof of Theorem 3

The proof proceeds in two steps:Step 1We first show that

IE(X(0))c

= 1− 1

1+ 1+β11−β1

N

N2

IE0(∑Nj=1 z j(0)x j(0)2)

IE0(∑Nj=1 z j(0)x j(0))c

.

Suppose that at a selected congestion time τn, a flow i istagged. Let x0 := xi(n). The spare capacity just after τn is(1−βi)x0. The flow i undergoes a multiplicative decrease ateach congestion time in [τn,τn+1). With no loss of generality,let τn = 0. Let sn be the time between the congestion timesn−1 and n as seen from 0, and xn be the rate of the flow i atthe nth congestion time. We have

xn+1 = βixn +ηisn (21)

The dynamics entails that

βixn +ηisn = xn −η−isn

where η−i := ∑Nj=1 η j1{ j �=i}.

Page 18: Parallel TCP Sockets: Simple Model, Throughput and Validation...Parallel TCP Sockets: Simple Model, Throughput and Validation Eitan Altman (INRIA), Dhiman Barman (INRIA), Bruno Tuffin

Hence,

sn =1−βi

ηxn.

Plugging the last identity into (21), we have xn+1 = ρixn wherewe denote ρi = βi +

ηiη (1−βi). It follows that

xn = x0ρni and sn =

1−βi

ηx0ρn

i .

The amount of units sent by the flow i in [0,τ1) is equal to∫ τ1

0Xi(u)du =

∑n=0

12(xn+1 +βixn)sn

=1

2η(1+βi)

ηη−i

− (1−βi)

1+ρix2

0.

Subsequently, the amount of units sent on [0,τ1) by flows otherthan i is ∫ τ1

0∑j �=i

Xj(u)du = (c− x0)τ1 +12

x0τ1

=1

η−i

(cx0 − 1

2x2

0

).

Putting the last two displays into the Palm inversion formulaand after some straightforward manipulations, we retrieve theasserted identity.

Step 2 By applying the same trick as in the proof ofTheorem 1 to the identities (10)–(11), we obtain

IE0(∑Nj=1 z j(0)x j(0)2)

IE0(∑Nj=1 z j(0)x j(0))c

=2N

.

The proof follows.