internet congestion control - authors.library.caltech.edu · various flow control mechanisms, ......

Internet CongestionControl

By Steven H. Low, Fernando Paganini, and John C. Doyle

This article reviews the current transmissioncontrol protocol (TCP) congestion controlprotocols and overviews recent advancesthat have brought analytical tools to thisproblem. We describe an optimization-basedframework that provides an interpretation of

various flow control mechanisms, in particular, the utilitybeing optimized by the protocol’s equilibrium structure. Wealso look at the dynamics of TCP and employ linear modelsto exhibit stability limitations in the predominant TCP ver-sions, despite certain built-in compensations for delay.Finally, we present a new protocol that overcomes these lim-itations and provides stability in a way that is scalable to ar-bitrary networks, link capacities, and delays.

IntroductionCongestion control mechanisms in today’s Internet alreadyrepresent one of the largest deployed artificial feedback sys-

tems; as the Internet continues to expand in size, diversity,and reach, playing an ever-increasing role in the integrationof other networks (transportation, finance, etc.), having asolid understanding of how this fundamental resource iscontrolled becomes ever more crucial. Given the scale andcomplexity of the network, however, and the heuristic, intri-cate nature of many deployed control mechanisms (whichwe summarize in the next section), until recently this prob-lem appeared to be well beyond the reach of analytical mod-eling and feedback control theory. Theoretical research onthis topic (e.g., [1]-[5]) has dealt mostly with simple scenar-ios (e.g., single-bottleneck, per-flow queueing); see also re-cent surveys in [6] and [7]. Meanwhile, the Internetcommunity has turned to small-scale simulations to vali-date designs. All of these leave huge gaps in our understand-ing of real network behavior.

In the last few years, large strides have been taken inbringing analytical models into Internet congestion control

28 IEEE Control Systems Magazine February 20020272-1708/02/$17.00©2002IEEE

Low ([email protected]) is with the Departments of Computer Science and Electrical Engineering, California Institute of Technology, Pasa-dena, CA 91125, U.S.A. Paganini is with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095, U.S.A.Doyle is with the Departments of Electrical Engineering and Control and Dynamical Systems, California Institute of Technology, Pasadena,CA 91125, U.S.A.

©D

IGIT

AL

VIS

ION

LTD

.

(see e.g., [8]-[22] and the references therein). Key to theseadvances has been the explicit modeling of the congestionmeasure that communicates back to data sources the infor-mation on congestion in network resources being used;more precisely, it is assumed that each network link mea-sures its congestion by a scalar variable (termed price) andthat sources have access to the aggregate price of links intheir path. These assumptions are implicitly present inmany variants of today’s TCP protocols; this framework ex-hibits the price signal being used in these protocols (e.g.,loss probability, queueing delay). Also, it is the natural set-ting for exploring alternative protocols based on more ex-plicit price signaling (e.g., bit marking).

Two types of studies are of fundamental interest. On theone hand, it is important to characterize the equilibriumconditions that can be obtained from a given congestioncontrol protocol from the point of view of fairness, effi-ciency in use of resources, dependence on network parame-ters, etc. In this regard, the above-mentioned prices can beinterpreted in economic terms (e.g., [9]) and the congestioncontrol system as seeking the global optimum of a certainaggregate utility function, subject to network capacity con-straints. By describing the utility implicit in existing TCPprotocols [11], [12], equilibrium properties are inferred,some of which corroborate empirically observed features.This methodology will be reviewed in the section “Equilib-rium Structure and Utility Optimization.” A second line of in-quiry concerns the dynamics of congestion controlprotocols, directly in the domain of control theory. In partic-ular, we are interested in the stability of the postulated equi-libria, especially in the presence of feedback delay, and inperformance metrics such as speed of convergence, capac-ity tracking, etc. In fact, by incorporating explicit measuresof congestion, recent analysis [21], [22] has shown that thepredominant TCP implementation (called Reno) and itsvariants are prone to instabilities when combined with net-work delays and, more surprisingly, with increases in net-work capacity. We will show similar studies in the sectiontitled “Dynamics and Stability.”

This raises the question of whether more efficient andstable protocols could be developed with the help of this an-alytic framework. We note that the constraint of decentral-ization makes it impossible for a controller to besynthesized by, for example, optimal control; still, one cantry to postulate a plausible control law and support it withproofs of stability and performance. In this regard, globalstability results that apply to arbitrary networks have beengiven for new price-based flow controllers [8], [10], [14],[15] in the absence of delay; further, delay can be studied interms of conditions on control gains to retain stability (e.g.,[8], [10], [22], [23]). Now as noted in, for example, [24] and[25], window-based protocols contain an automatic com-pensation for delay (“self-clocking”); this has led to recentwork seeking protocols that would remain stable for suffi-ciently small delays [26]-[28]. In this vein, we describe in the

final section, “A Scalable Control,” some of our recent workin finding a protocol that can be implemented in a decentral-ized way by sources and routers and that provides linearstability for arbitrary delays, capacities, and routes.

Although TCP Reno has performed remarkably above ex-pectations and is widely deployed, we emphasize a key limi-tation of this congestion control mechanism: by usingpacket loss as a congestion measure, high utilization can beachieved only with full queues, i.e., when the network oper-ates at the boundary of congestion. This seems particularlyill-suited to handle the types of traffic that have been ob-served in recent studies. Indeed, Internet traffic, 90% ofwhich is TCP based (see measurements at www.caida.org),exhibits burstiness at many time scales, which is due to theheavy-tailed nature of file sizes [29]-[33]. In simple terms,this means that most TCP connections are “mice” (short,but requiring low latency), but a few long TCP connections(“elephants,” which can tolerate latency) generate most ofthe traffic. By controlling the network around a state withfull queues, the elephants subject the mice to unnecessaryloss and queueing delays. This problem can be avoided bydecoupling loss from price signaling. Another limitation ofusing loss to measure congestion is the degradation of per-formance in the cases where losses are often due to other ef-fects (e.g., wireless links). These considerations aremotivating a new look at congestion control protocols; ouraim in this article is to argue that a more sound analyticalperspective, now available, should be brought to bear onthis investigation.

Current TCP ProtocolsTCP uses “window” flow control, where a destination sendsacknowledgments for packets that are correctly received. Asource keeps a variable called window size that determinesthe maximum number of outstanding packets that havebeen transmitted but not yet acknowledged. When the win-dow size is exhausted, the source must wait for an acknowl-edgment before sending a new packet. Two features areimportant. The first is the “self-clocking” feature that auto-matically slows down the source when a network becomescongested and acknowledgments are delayed. The secondis that the window size controls the source rate: roughly onewindow of packets is sent every round-trip time. The firstfeature was the only congestion control mechanism in theInternet before Van Jacobson’s proposal in 1988 [24]. Jacob-son’s idea is to dynamically adapt window size to networkcongestion. In this section, we will review how TCP inferscongestion and adjusts window size.

TCP also provides other end-to-end services such as er-ror recovery and round-trip time estimation, but we willlimit our attention to the congestion control aspect.

TCP Tahoe and RenoThe predominant TCP implementations are called Tahoeand Reno. The basic idea of these protocols is for a source to

February 2002 IEEE Control Systems Magazine 29

gently probe the network for spare capacity by linearly in-creasing its rate and exponentially reducing its rate whencongestion is detected. Congestion is detected when thesource detects a packet loss.

A connection starts cautiously with a small window sizeof one packet (up to four packets have recently been pro-posed), and the source increments its window by one everytime it receives an acknowledgment. This doubles the win-dow every round-trip time and is called slow-start. Whenthe window reaches a threshold, the source enters the con-

gestion avoidance phase, where it increases its window bythe reciprocal of the current window size every time it re-ceives an acknowledgment. This increases the window byone in each round-trip time and is referred to as additive in-crease. The threshold that determines the transition fromslow-start to congestion avoidance is meant to indicate theavailable capacity in the network and is adjusted each timea loss is detected. On detecting a loss, the source sets theslow-start threshold to half of the current window size, re-transmits the lost packet, and re-enters slow-start by reset-ting its window to one.

This algorithm was proposed in [24] and implemented inthe Tahoe version of TCP. Two refinements, called fast re-covery, were subsequently implemented in TCP Reno to re-cover from loss more efficiently. Call the time from detectinga loss (through duplicate acknowledgments) to receivingthe acknowledgment for the retransmitted packet the fastretransmit/fast recover (fr/fr) phase. In TCP Tahoe, the win-dow size is frozen in the fr/fr phase. This means that a newpacket can be transmitted only a round-trip time later. More-over, the “pipe” from the source to the destination is clearedwhen the retransmitted packet reaches the receiver, andsome of the routers in the path become idle during this pe-riod, resulting in loss of efficiency. The first refinement al-lows a Reno source to temporarily increment its window byone on receiving each duplicate acknowledgment while it isin the fr/fr phase. The rationale is that each duplicate ac-knowledgment signals that a packet has left the network.When the window becomes larger than the number of out-standing packets, a new packet can be transmitted in thefr/fr phase while it is waiting for a (nonduplicate) acknowl-edgment for the retransmitted packet. The second refine-ment essentially sets the window size at the end of the fr/frphase to half of the window size when fr/fr starts and enterscongestion avoidance directly. Hence, slow-start is enteredonly rarely in TCP Reno when the connection first starts and

when a loss is detected by timeout rather than duplicate ac-knowledgments.

TCP VegasTCP Vegas [34] improves upon TCP Reno through threemain techniques. The first is a new retransmission mecha-nism where timeout is checked on receiving the first du-plicate acknowledgment, rather than waiting for the thirdduplicate acknowledgment (as Reno would), and resultsin a more timely detection of loss. The second technique

is a more prudent way to grow the windowsize during the initial use of slow-startwhen a connection starts up, and it resultsin fewer losses.

The third technique is a new congestionavoidance mechanism that corrects the os-cillatory behavior of Reno. The idea is tohave a source estimate the number of itsown packets buffered in the path and try to

keep this number betweenα (typically 1) andβ (typically 3)by adjusting its window size. The window size is increasedor decreased linearly in the next round-trip time accordingto whether the current estimate is less than α or greaterthan β. Otherwise the window size is unchanged. The ratio-nale behind this is to maintain a small number of packets inthe pipe to take advantage of extra capacity when it be-comes available. Another interpretation of the congestionavoidance algorithm of Vegas is given in [12], in which a Ve-gas source periodically measures the round-trip queueingdelay and sets its rate to be proportional to the ratio of itsround-trip propagation delay to queueing delay, the propor-tionality constant being between α and β. Hence, the morecongested its path, the higher the queueing delay and thelower the rate. The Vegas source obtains queueing delay bymonitoring its round-trip time (the time between sending apacket and receiving its acknowledgment) and subtractingfrom it the round-trip propagation delay.

FIFO, DropTail, and REDA Vegas source adjusts its rate based on observed queueingdelay; in other words, it uses queueing delay as a measure ofcongestion. This information is updated by the FIFO(first-in-first-out) buffer process and fed back implicitly tosources through round-trip time measurement. A Renosource uses loss as a measure of congestion. This informa-tion is typically generated and fed back to sources throughDropTail, a queueing discipline that drops an arrival to a fullbuffer. RED (random early detection) [35] is an alternativeway to generate the congestion measure (loss) to Renosources. Instead of dropping only at a full buffer, RED main-tains an exponentially weighted queue length and dropspackets with a probability that increases with the averagequeue length. When the average queue length is less than aminimum threshold, no packets are dropped. When it ex-ceeds a maximum threshold, all packets are dropped. When

30 IEEE Control Systems Magazine February 2002

Congestion control mechanisms inthe Internet represent one of thelargest deployed artificial feedbacksystems.

it is in between, a packet is dropped with a probability that isa piecewise linear and increasing function of the averagequeue length. This type of strategy is called active queuemanagement (AQM).

Analytical ModelsIn this section, we describe analytical models that were de-veloped in the last few years. A large body of literature existson congestion control, but we will focus narrowly on theserecent models.

A network is modeled as a set of L links with finite capaci-ties c c l Ll= ∈( , ). They are shared by a set of N sources in-dexed byi in set I . Each sourcei uses a set L Li ⊆ of links. Thesets Li define an L N× routing matrix

Rl L

lii=

∈

1

0

,

,

if

otherwise.

A first consideration is that we will use deterministic flowmodels to describe transmission rates, in contrast to muchof classical queueing theory, which relies on stochastic(e.g., Poisson) models for traffic. While randomness in net-work arrivals is a natural assumption when modeling entireconnections [36], it is less suitable at the packet level, wheretransmission times for congestion-controlled sources aredetermined predominantly by feedback, as described in theprevious section. Furthermore, the distributions to be usedin such random models have recently come into question[29]-[33], and in any event, there are few tractable results onqueueing theory in the presence of feedback (but see [37]).For these reasons, we will study feedback at a higher level ofaggregation than packets, modeling rates as flow quantities.Each source i has an associated transmission rate x ti( ); theset of transmission rates determines the aggregate flow y tl( )at each link by

( )y t R x tli

li i lif( ) = −∑ τ , (1)

in which the forward transmission delays τ lif from sources to

links are accounted for. We assume each link has a capacitycl in packets per second.

The next step is to model the feedback mechanism thatcommunicates to sources the congestion infor-mation about the network. The key idea in theline of work we are discussing is to associatewith each link l a congestion measure p tl( ),which is a positive real-valued quantity. Due tothe economic interpretations to be discussed inthe next section, we will call this variable a“price” associated with using link l. The funda-mental assumption we make is that sourceshave access to the aggregate price of all links intheir route,

( )q t R p til

li l lib( ) = −∑ τ . (2)

Here again we allow for backward delays τ lib in the feedback

path. As we will discuss later, this feedback model includes,to a good approximation, the mechanism present in existingprotocols, with a different interpretation for price in differ-ent protocols (e.g., loss probability in TCP Reno, queueingdelay in TCP Vegas).

The preceding equations can be represented in theLaplace domain in terms of the delayed forward and back-ward routing matrices:

[ ( )],

,

[ ( )]

R se l L

R se

f li

si

b li

lif

= ∈

=

−

−

τ if

otherwise0

τ lib s

il L,

,

if

otherwise.

∈0 (3)

Then we have, in vector form (T denotes transpose):

y s R s x sf( ) ( ) ( )= , (4)

q s R s p sbT( ) ( ) ( )= . (5)

To specify the congestion control system, it remains todefine i) how the sources adjust their rates based on theiraggregate prices (the TCP algorithm) and ii) how the linksadjust their prices based on their aggregate rates (the AQMalgorithm). At the source side, we can in general postulate adynamic model of the form

� ( , )

( , ),

z F z q

x G z qi i i i

i i i i

== (6)

where zi would be a local state variable. We will, however,mostly encounter two special cases: the static case, wherethere is no zi and we have x t G q ti i i( ) ( ( ))= , and thefirst-order case with z xi i= .

Similarly, at the link level one can write a dynamic law

� ( , )� ( , ).

v H y v

p K y vl l l l

l l l l

== (7)


Sources

0

0

qR (s)b

Tp

0

0

Links

x yR (s)f

……

Figure 1. General congestion control structure.

The key restriction in the above control laws is that theymust be decentralized, i.e., sources and links only have ac-cess to their local information. The overall structure of thecongestion control system is now depicted in Fig. 1, wherethe diagonal structure of the source and link matrices repre-sents the decentralization requirement.

We will discuss TCP models within this general frame-work. In the next subsection, we will focus on equilibriumproperties; dynamic issues are tackled in the following sub-section, “Dynamics and Stability.”

Equilibrium Structureand Utility OptimizationIn this section, we study the above feedback at equilibrium,i.e., assuming the rates and prices are at some fixed valuesx y p q∗ ∗ ∗ ∗, , , . We will see how an optimization formulationhelps understand the properties of such equilibria.

The equilibrium relationships y Rx∗ ∗= , q R pT∗ ∗= followimmediately from (4)-(5). Here R is the static routing matrix;since we are discussing equilibrium, we can set s = 0 in themodel (3) (equivalently, setting all delays to zero).

The first basic assumption we make is that the equilib-rium rates satisfy

x f qi i i∗ ∗= ( ),

where fi( )⋅ is a positive, strictly monotone decreasing func-tion. This function can be found by finding the equilibriumpoint in (6); in the static case, it is just given by the sourcestatic law. Monotonicity is a natural assumption for all pro-tocols: if qi represents congestion in the source’s path, theequilibrium source rate should be a monotonically decreas-ing function of it.

We now see that this assumption alone allows us to intro-duce an optimization interpretation for the equilibrium byintroducing a source utility function. Namely, consider theinverse f xi i

−1( ) of the above function and let U xi i( ) be its in-tegral; i.e., ′ = −U x f xi i i i( ) ( )1 . By assumption,U xi i( )has a posi-tive, decreasing derivative and is therefore itself monotoneincreasing and strictly concave. Now, by construction, theequilibrium rate will solve

max ( )x i i i i

i

U x x q− ∗. (8)

The above equation can be interpreted in economic terms: ifU xi i( ) is the utility the source attains for transmitting at ratexi, and qi

∗ is the price per unit flow it is hypotheticallycharged, the above represents a maximization of an individ-

ual source’s profit. We emphasize that this interpretationrequired minimal assumptions about the protocol; given amodel of the source (TCP) control, one can derive from itthe utility function associated with the protocol, as we willsee below.

The role of prices is to coordinate the actions of individ-ual sources so as to align individual optimality with socialoptimality, i.e., to ensure that the solutions of (8) also solvethe problem

max ( )x

ii iU x

≥ ∑0, (9)

subject to Rx c≤ ; (10)

in other words, maximize aggregate utilityacross all sources, subject to link capacityconstraints. This problem, formulated in[38], is a convex program for which a unique

optimal rate vector exists. The challenge is to solve it in adistributed manner over a large network.

A natural way to introduce prices in regard to the aboveoptimization is the duality approach introduced in [10] (seealso [39] for related ideas and [20] for multicast extensions).Here prices appear as Lagrange multipliers for the problem(9)-(10). Specifically, consider the Lagrangian

L x p U x p y c

U x q x p c

ii

i ll

l l

ii

i i i ll

l

( , ) ( ) ( )

( ) .

= − −

= − +

∑ ∑∑ ∑

The dual problem is

min ( )p i

ii l

llB q p c

≥ ∑ ∑+0

, (11)

where

B q U x x qi i x i i i ii

( ) max ( )= −≥ 0

. (12)

Convex duality implies that at the optimum p∗s (which neednot be unique), the corresponding x ∗ maximizing individualoptimality (12) is exactly the unique solution to the primalproblem (9)-(10). Note that (12) is identical to (8); therefore,provided the equilibrium prices p∗ can be made to align withthe Lagrange multipliers, the individual optima, computedin a decentralized fashion by sources, will align with theglobal optima of (9)-(10).

The simplest link algorithm that guarantees these equi-librium prices are indeed Lagrange multipliers, as shown in[10], is based on applying the gradient projection algorithmto the dual problem (11):

�( ( ) ) ( )

[ ( ) ] ( ) ,p

y t c p t

y t c p tll l l

l l l

=− >− =

+

γγ

if

if

0

0 (13)


Any pricing scheme that stabilizesqueues or queueing delays solvesthe dual problem.

where [ ] max{ , }z z+ = 0 . The fact that the gradient of theLagrangian only depends on aggregate rates yl is key to theabove decentralized implementation at the links. If theabove equations are at equilibrium, we have y cl l

∗ ≤ , withnonzero prices pl

∗ corresponding to the active constraints.It follows that equilibrium prices are the Lagrange multipli-ers. This property, however, is not exclusive to this algo-rithm; rather, any pricing scheme that stabilizes queues, orqueueing delays (e.g., RED or Vegas, see below), so that flowrates yl

∗ are matched to capacities cl , will allow for the sameinterpretation. This is because matchingrate drives the gradient of the dual problem(11)-(12) with respect to p to zero, solvingthe dual problem and implying that the re-sulting prices are Lagrange multipliers.

Another approach to price variables, pro-posed in [8] (and used also in [15]), is to treatthem as a penalty or barrier function for theconstraint (10). Here prices are assumed tobe a static, increasing function of yl ,p h yl l= ( ), which becomes large as yl approaches cl . It followsthat the resulting equilibrium maximizes the global utility

U x h y dyii

i l

y

l

l∑ ∫∑−( ) ( )0

,(14)

which can be seen as an approximation to the above prob-lem (9)-(10).

To summarize the discussion so far, under very generalassumptions, the equilibrium points of source protocolscan be interpreted in terms of sources maximizing individ-ual profit based on their own utility functions. Link algo-rithms generate prices to align, exactly or approximately,these “selfish” strategies with social welfare. Different pro-tocols correspond to different utility functionsUi and to dif-ferent dynamic laws (6)-(7) that attempt, in a decentralizedway, to reach the appropriate equilibrium. We now take acloser look at modeling Reno and Vegas in this context.

TCP Reno/REDWe focus only on the congestion avoidance phase of TCPReno, in which an elephant typically spends most of its time.We take source rates as the primal variable x and link lossprobabilities as prices p. In this section, we assume theround-trip time τ i of sourcei is constant and that rate xi is re-lated to window wi by

x tw t

ii

i

( )( )=τ

.(15)

The window of outstanding packets at time t actually re-flects the average rate in the interval [ , ]t ti− τ ; the above ap-proximation is valid since our models are not intended toprovide accurate description at finer time scales than theround-trip time.

We also make the key assumption that loss probabilitiesp tl( )are small so that the end-to-end probabilitiesq ti( )satisfy

q t p t p ti ll L

ll Li i

( ) ( ( )) ~ ( )= − − −∈ ∈∏ ∑1 1

for all i and for all t.We now model the additive-increase-multiplicative-de-

crease (AIMD) algorithm of TCP Reno in an average sense attime scales above the round-trip time. In particular, ourmodels do not attempt to predict a window jump of the sort

observed under MD; rather, they should track the mean evo-lution of the window as a function of ACKs and losses. We ini-tially ignore feedback delays, since we are interested inequilibrium points.

At time t, x ti( ) is the rate at which packets are sent andacknowledgments received. A fraction( ( ))1 −q ti of these ac-knowledgments are positive, each incrementing the windoww ti( ) by1 / ( )w ti ; hence the windoww ti( ) increases, on aver-age, at the rate of x t q t w ti i i( )( ( )) / ( )1 − . Similarly, negativeacknowledgments are returning at an average rate ofx t q ti i( ) ( ), each halving the window, and hence the windoww ti( ) decreases at a rate of x t q t w ti i i( ) ( ) ( ) / 2. Hence, sincex t w ti i i( ) ( ) /= τ , we have for Reno the average model

� ( )( ) ( )x

q tq t x ti

i

ii i= − −1 1

222

τ.

(16)

We now consider the equilibrium of (16):

qxi

i i

∗∗=

+2

2 2 2τ ( ).

(17)

From it, we can obtain the utility function of TCP Reno byidentifying the above with the Karush-Kuhn-Tucker condi-tion ′ =∗ ∗U x qi i i( ) . This gives the utility function

U xx

i ii

i i( ) tan=

−22

1

ττ

,(18)

which seems to appear first in [9] and [11]. Our descriptionhere is slightly different from that in [11] in that, here, lossprobability is taken as the dual variable regardless of thelink algorithms, which from this point of view affect the dy-namics but not the utility function.


The objective is to have localdynamic stability for arbitrary

network delays, link capacities, androuting topologies.

The relation (17) between equilibrium source rate andloss probability reduces to the well-known relation (see,e.g., [40] and [41])

xa

qii i

=τ

,

when the probability qi is small or, equivalently, when thewindow τ i ix is large compared with 2. (This corresponds toreplacing( ( ))1 −q ti in (16) by 1, as done in [22].) The value ofthe constant a around 1 has been found empirically to de-pend on implementation details such as TCP variant (e.g.,Reno versus NewReno versus SACK) and whether delayedacknowledgment is implemented. Equating ′U xi i( ) with qi,the utility function of TCP Reno becomes

U xaxi i

i i

( ) = −2

2τ.

This version is used in [15] and [42].We now turn to the link algorithm, i.e., how loss probabili-

ties p tl( ) are generated as a function of link rate yl . BothDropTail and RED produce losses as a function of the state ofthe queue, so the question involves modeling of the queuedynamics as a function of the input rate yl . In the currentcontext of flow models, it is natural to model queues as inte-grating the excess capacity; this option is discussed below,and in the case of RED, one can then easily relate queues toloss (or marking) probability. Unfortunately, a deterministicmodel of DropTail that includes the queue dynamics doesnot appear to be tractable.

At the other extreme, many references model queues asbeing in steady state and postulate a static law p h yl l l= ( ) forloss probability as a function of traffic rate, e.g., [8], [15]. Thequestion as to whether the steady-state queue assumptioncan be justified is still a subject of current research [43]; how-ever, dynamic simulation studies of Reno/RED, as in [22] orthose described below (see Fig. 2(b)), indicate that queue dy-namics are indeed significant at the time scale of interest.

For this reason we will consider the first option andmodel queues as integrators, focusing on the RED queuemanagement to obtain simple models of loss probabilities.Letb tl( )denote the instantaneous queue length at time t; itsdynamics is then modeled by

� ( ( ) ), ( )

[ ( ) ] , ( ) .b

y t c b t

y t c b tll l l

l l l

=− >− =

+

if

if

0

0 (19)

RED averages the instantaneous queue by an exponentiallyweighted average; denoting r tl( ) to be the averaged queuelength, we can model it as a low-pass filter

� ( ( ) ( ))r c r t b tl l l l l= − −α (20)

for some constant0 1< <α l . Given the average queue lengthr tl( ), the marking (or dropping) probability is given by astatic function

p t m r t

r t b

r t b b r t bl l l

l l

l l l l l l( ) ( ( )) :

, ( )

( ) , ( )= =

≤− < <

0

ρ ρ l

l l l l l l

l l

r t p b r t b

r t b

η ( ) ( ), ( )

, ( )

− − ≤ <≥

1 2 2

1 2 (21)

where b bl l, , and pl are RED parameters,

ρ ηll

l ll

l

l

pb b

pb

: :=−

= −and

1.

Now (19), (20), and (21) fall into the general form (7), wherethe internal state vl is composed of bl and rl .

Assume now that these equations are in equilibrium. It isnot difficult to see that in this case y cl l

∗ ≤ , and that the in-equality is only strict when pl

∗ = 0. These facts imply that theequilibrium pl

∗ constitute a set of Lagrange multipliers forthe dual problem (11)-(12). Thus, we conclude that if theReno/RED reaches equilibrium, the resulting source rates xi

∗

will solve the primal problem (9)-(10) with the source utilityfunctions given in (18). Moreover, the loss probabilities pl

∗

are Lagrange multipliers that solve the dual problem(11)-(12).

We remark again that the preceding analysis refers to theaveraged model of TCP, as described in (16); we are not at-tempting to impose equilibrium on the detailed evolution ofa window under AIMD, but rather on its mean evolution, aswould happen with the average of many identical sources(see simulations below). Still, even in this mean sense, wehave not given any indication yet that TCP reaches equilib-rium. Indeed, in the next section we will find that it oftendoes not, since the equilibrium is unstable and a limit cycleis observed; in particular, average windows and queues canoscillate dramatically. Nevertheless, the above equilibriumanalysis is useful in understanding the state that is aimed forby the current protocols, i.e., the resource allocation policyimplicitly present in the current Internet. Furthermore,there is evidence that some of the insights derived from theequilibrium models do reflect empirical properties of theInternet. This suggests that the models might have value indescribing the protocol’s long-term behavior, even in an os-cillatory regime. We now discuss some of these insights.

Delay and LossThe current protocol (Reno with DropTail) fills, rather thanempties, bottleneck queues when the number of elephantsbecomes large, leading to a high loss rate and queueing de-lay. What is more intriguing is that increasing the buffer sizedoes not reduce loss rate significantly, but only increasesqueueing delay. This delay and loss behavior is exactly op-posite the mice-elephant control strategy we aim for: tomaximally utilize the network in a way that leaves network


queues small so that delay-sensitive mice can fly throughthe network with little queueing delay.

According to the duality model, loss probability underReno is the Lagrange multiplier, and hence its equilibriumvalue is determined solely by the network topology and thenumber of sources, independent of link algorithms and buffersize. Increasing the buffer size but leaving everything elseunchanged does not change the equilibrium loss probabil-ity, and hence a larger backlog must be maintained to gener-ate the same loss probability. This means that withDropTail, the buffer at a bottleneck link is always close tofull, regardless of buffer size. With RED, since loss probabil-ity is increasing in average queue length, the queue lengthmust increase steadily as the number of sources grows.

FairnessIt is well known that TCP Reno discriminates against con-nections with large propagation delays. This is clear from(17), which implies that Reno equalizes windows forsources that experience the same loss probability, andhence their rates are inversely proportional to theirround-trip times.

The equilibrium characterization (17) also exposes the“beat down” effect, where sources that go through morecongested links, seeing larger qi, receive less bandwidth.This effect is hidden in single-link models and, in multilinkmodels, is often confused with delay-induced discrimina-tion of TCP, as expressed in (17). It has been observed insimulations [44] and has long been deemed unfair, but theduality model shows that it is an unavoidable, and even de-sirable, feature of end-to-end congestion control. For eachunit of increment in aggregate utility, a source with a longerpath consumes more resources and hence should be beatendown. If this is undesirable, it can be remedied by weightingthe utility function with delay.

TCP VegasThe dynamic model and utility function Ui of TCP Vegashave been derived and validated in [12]. We briefly summa-rize the results here.

The utility function of TCP Vegas is

U x d xi i i i i( ) log= α ,

where α i is a protocol parameter and di is the round-trippropagation delay of source i. In equilibrium, source i buff-ersα i id packets in the routers in its path. The utility functionimplies that Vegas achieves proportional fairness in equilib-rium.

The price variable in TCP Vegas is queueing delay, whichevolves according to

� ( ( ) )pc

y t cll

l l= −1,

(22)

with an additional nonnegativity constraint, exactly as in(13) with γ replaced by 1/cl . Therefore, again we can inter-pret equilibrium prices as Lagrangian multipliers.

To describe the rate adjustment (6), let

x t U q td

q ti i ii i

i

( ) ( ( ))( )

= ′ =− 1 α

be the target rate chosen based on the end-to-end queueingdelay q ti( ) and the marginal utility ′Ui. Then Vegas’s sourcealgorithm is

�, ( ) ( )

, ( ) ( )x

x t x t

x t x ti

ii i

ii i

=<

− >

1

1

2

2

τ

τ

if

if

moving the source rate x ti( ) toward the target rate x ti( )at apace of1 2/ τ i .

The models of TCP Reno and Vegas are summarized inTable 1.

Dynamics and StabilityAs discussed in the previous section, static or dynamic con-trol laws at sources and links attempt to drive the system toa desirable equilibrium point. So far, we have only used dy-namic models to derive the equilibrium points, and we notethat there can be different dynamic laws with the same equi-librium, only distinguished by their dynamic properties,which we now discuss. Mainly, we would like to determinewhether the equilibrium is (locally or globally) stable. We


Table 1. Models of TCP/AQM. (x t U q ti i i( ) ( ( ))= ′−1 )

TCP Model

Reno Source control � ( )( ) ( )x

q tq t x ti

i

ii i= − −1

212

2

τ

RED Link control � ( ( ) ) ( )

[ ( ) ] ( )�

by t c b t

y t c b tr

ll l l

l l l

l

=− >− =

=

+

if

if

0

0− −

=α l l l l

l l l

c r t b t

p m r

( ( ) ( ))

( )

UtilityU x

xi i

i

i i( ) tan=

−22

1

ττ

Vegas Source control

�, ( ) ( )

, ( ) ( )x

x t x t

x t x ti

ii i

ii i

=<

− >

1

1

2

2

τ

τ

if

if

FIFO Link control

�( ( ) ), ( )

[ ( ) ] , ( )p c

y t c p t

cy t c p t

ll

l l l

ll l l

=− >

− =+

10

1

if

if 0

Utility U x d xi i i i i( ) log= α

begin with a brief overview of some dynamic laws that havebeen proposed in the optimization framework and that al-low for analytical stability results. We will then move tostudy in detail the dynamics of TCP Reno/RED.

In general, one could have dynamics at both sources andlinks; however, most analytical results refer to systemswhere only one of the laws is dynamic and the other static.In this regard, [8] denotes by primal algorithms those wherethe dynamics are at the sources, and by dual algorithmswhen the dynamics are at the links. An example of dynamicsat the source is the first-order law (used in [8] for a particu-lar utility function)

� ( ( ) ))x U x q xi i i i i= ′ −κ .

Combined with a static link law p h yl l l= ( ), it is shown in [8]that the system, in the absence of delays, has a single global at-tractor, which optimizes the cost function in (14); in fact, thismodified utility serves as a Lyapunov function for the system.

An example of dynamics at the links was already given in(13); combined with the static source control

[ ]x t U q ti i i( ) ( ( ))= ′ − +1 , (23)

it is shown in [10] that global stability is obtained. Anotherlink algorithm, proposed in [45] and [46], is

�( ), ( )

[ ] , (p

y c b p t

y c b p tll l l l l l

l l l l l l

=− + >− + +

γ αγ α

if

if

0

) ,= 0

where bl is the queue length, as in (19). Global stability in theabsence of delay for this scheme, together with (23), has beenproved in [14] by a Lyapunov argument. This protocol can beimplemented in a similar fashion to RED, but simulations in[45] and [46] have shown a marked improvement over RED interms of achieving fast responses and low queues.

Recently, [16] has proposed a scheme with dynamics atboth links and sources, but working at different time scales;thus, stability analysis reduces to two problems, one withstatic links and one with static sources.

We emphasize that the Lyapunov-based stability proofsin [8] and [14], while global, do not consider network de-lays. Some local, linearized studies in [8] and [23] examinetolerance to delay and in general yield bounds on the vari-ous gain parameters of the algorithms (e.g., κi, γ l must be in-versely proportional to delay) to maintain local stability. Inturn, the global stability analysis in [10] is done in discretetime, possibly asynchronously, and does allow for delays,but again stability is guaranteed only if gain parameters aresufficiently small.

Note that delays are the only dynamics of the open-loopsystem described in Fig. 1; were it not for delay, the rate andprice adaptation could be performed arbitrarily fast. Thus,it is natural that gain parameters should be chosen in-versely proportional to delay. Since sources measure their

round-trip time, and a scaling by1/τ i is already implicit in awindow-based protocol due to (15) (the “self-clocking” fea-ture), this raises the intriguing possibility that compensa-tion for delay could be done automatically in a protocolsuch as Reno.

We now turn to a detailed study of Reno/RED that con-tains dynamics at both sources and links.

Dynamics of Reno/REDThe dynamic model of Reno/RED has so far been used onlyto understand the equilibrium properties; we now study thedynamic properties around an equilibrium by linearizingthe model developed earlier. Before we do that, we must re-fine the nonlinear model to include the effect of delays,which are essential to stability analysis.

For this purpose, we must account for forward and back-ward delays in the propagation of rates and prices, as wasdone in (1)-(2). For the window dynamics, a first approxima-tion would be

� ( )( ( ))( )

( ) ( )( )

w x t q tw t

x t q tw t

i i i ii

i i ii= − − − −τ τ1

12

,(24)

withq ti( )as in (2). Here we incorporate the fact that the rateof incoming ACKs is determined by the source rate τ i unitsof time ago. However, a subtle issue that arises in these mod-els is that the round-trip time is itself time varying, since itdepends on queueing delays. In particular, round-trip timecan be expressed as

τ i i lil

l

l

t d Rbc

( ) = + ∑ ,

where di is the round-trip propagation delay and bl is thebacklog at link l at the time the packet arrived at the link.These arrival times are difficult to capture exactly, sincethey depend themselves on queueing delays earlier in thepath; this would mean that the time argument in bl abovewould depend recursively on other queues, themselves atearlier times, and so on. We avoid this issue by assuming acommon time t for all the queues

τ i i lil

l

l

t d Rb t

c( )

( )= + ∑ .(25)

This simplification is acceptable provided we do not at-tempt to model time resolution smaller than round-triptimes. Still, difficulties remain. In particular, if one general-izes (15) to

x tw t

tii

i

( )( )( )

=τ

,(26)

the window equation (24) would contain x t t wi i i( ( )) [− =τ( ( ))] / [ ( ( ))]t t t ti i i− −τ τ τ , with a nested time argument in the


round-trip time, which is not easy to interpret. For this rea-son, we adopt the following convention: wheneverround-trip time, or forward and backward delay, appears inthe argument of a variable, we will replace it by its equilib-rium value τ i

∗, τli

f ∗, τli

b ∗. This accounts for equilibriumqueueing delays, but at that level does not include their vari-ation in time. However, when round-trip time appears in thedependent variable, as in (26), we will consider it time vary-ing and use for it the model in (25). This avoids recursivetime arguments but is admittedly an approximation, doneexclusively for model tractability. (Similar, though slightlymore restrictive, approximations were made in [22]. As thispaper first noted, and we corroborate below, retaining delayvariations in the round-trip time is essential for the predic-tive power of this model.) Confidence in these approxima-tions can only be obtained through comparisons withpacket-level simulations.

With these approximations, we obtain the following non-linear, time-delayed model for Reno, expanding on (24):

( )� ( )( ) ( )

w R p tw t

t w ti lil

l lib i i

i i i

= − −

−−∑ ∗

∗

∗11τ τ

τ τ

( )− − −−∑ ∗∗

∗

12

R p tw t w t

tlil

l lib i i i

i i

τ ττ τ

( ) ( )( )

.

Assuming the routing matrix R has full rank, there is aunique equilibrium ( , )w p∗ ∗ . Linearizing around it, we have(variables now denote perturbations)

( )� ( )*

wq

R p tq w

w tii i l

li l lib i i

ii= − − −∗ ∗

∗∗ ∗

∑1τ

ττ

.

Now consider the link dynamics of RED, as described inTable 1. For the purposes of linearization, we note thatnonbottleneck links (with empty equilibrium queues) canbe ignored. For bottleneck links, we make the assumptionthat rate increases of a source affect all bottlenecks in itspath (we do not model the throttling effect that upstreamlinks may have on downstream ones) and write

( )( )

( )

�b Rw t

tc

Rw t

d R

l lii

i lif

i lif l

lii

i lif

i k

=−

−−

=−

+

∑

∑

∗

∗

∗

τ

τ τ

τ

( )ik k lif

klb t c

c∑ −

−∗τ /

.

Let τ i i kik k kd R b c∗ ∗= + ∑ / be the equilibrium round-trip time(including queueing delay). Linearizing we have (variablesnow denote perturbations):

( )�( )

b Rw t

R Rw

cb tl li

i

i lif

i ili

kki

i

i kk li=

−− −∑ ∑ ∑

∗

∗

∗

∗

ττ τ

τ2 ( )f ∗ .

The second term above would be ignored if we did not in-clude queueing delay in the round-trip time. The double

summation sums over all links k that share a source i withlink l. It says that the link dynamics in the network are cou-pled through shared sources. The term ( / ) ( )w c b ti i k k li

f∗ ∗ ∗−τ τis roughly the backlog at link k due to packets of source i, un-der FIFO queueing. Hence the backlog b tl( ) at link l de-creases at a rate that is proportional to the backlog of thisshared sourcei at another link k. This is because the backlogin the path of source i reduces the rate at which source ipackets arrive at link l and hence decreases b tl( ).

Putting everything together, Reno/RED is described, inthe Laplace domain, by

w s sI D D R s p s

p s sI D D b s

b

bT( ) ( ) ( ) ( ),

( ) ( ) ( ),

= − += +

−

−1

12

31

4

( )( ) ( ) ( ) ( )s sI R s D R D R s D w sfT

f= +−

5 6

1

7 (27)

where the diagonal matrices are

Dq w

D c

Dw

i i

i

l l

i

i

1

3

5

=

=

=

∗ ∗

∗

∗

diag

diag

diag

τα

τ

,

( ),

( ∗

∗ ∗

=

=

=)

,

( ),

2

2

4

6

1D

qD c

D

i i

l l l

diag

diag

d

τα ρ

iag diag1 1

7cD

l i

=

∗, ,

τ

and R sf ( ) and R sb( ) are defined in (3).To gain some insight into the system’s behavior, let us

specialize to the case of a single link with N identical sources(see generalization to heterogeneous sources in [47]). Thetransfer functions are (dropping all subscripts and eliminat-ing b s( ))

( )w se

p s p wp s

b s

( ) ( )*

= −+

−

∗ ∗ ∗ ∗

τ

τ,

(28)

p sc

s cNe

s ew s

f

f

s

s( ) ( )

*

**=

+ +

−

−

α ρα τ

τ

τ,

(29)

where w c N∗ ∗= τ / is the equilibrium window size and p∗ isthe equilibrium loss probability. When forward delay τ f

∗ = 0,the model (28)-(29) reduces to that in [22]. It is easy to showthat τ τf ∗ ∗< implies that (29) is open-loop stable. Hence theclosed-loop system is stable if and only if the loop function

L sc

s c s e

Np s p w

ef s

s( )( )

=+ + +∗ − ∗ ∗ ∗ ∗

−∗

∗α ρα τ ττ

τ1

does not encircle ( , )−1 0 as s traverses the closed D contourin the complex plane. Substitution with the equilibrium val-ues of p∗ and w ∗ yields (using p w∗ ∗−~ /2 2)

L sc

s c s e

cN c s N

ef s

s( )( )

=+ + +∗ −

∗

∗−

∗

∗α ρα τ

τττ

τ12 2

3 3

2.

(30)


The first factor above is due to queue averaging in RED.The second factor describes the relation between windowsand buffer size; the term e

f s− ∗τ arises due to the effect ofqueueing delays in (27). If τ f ∗ = 0, it reduces to a low-pass fil-ter of time constant1/τ∗. The third term is due to Reno; it hasa dc gain of ( / ) ( / )c N w N3 3 2 34 4τ∗ ∗= and a pole at( / ) ( / )2 22N c wτ τ∗ ∗ ∗= . Under typical conditions, the equilib-rium window satisfiesw ∗ >> 2so the system has high gain atlow frequencies and a pole that is slower than that of thesecond term. Finally, we have the round-trip feedback delay.

The above loop gain consists of a stable function times apure delay. The latter will provide significant phase, startingat frequencies of the order of1/τ. Therefore, a classical Bodeplot analysis says that closed-loop stability will require thatthe loop gain at those frequencies be below unity. (Equiva-lently, one might say that it is impossible for a stable loop totrack variations that are faster than the pure delay of theloop.) This suggests that it is difficult for RED to stabilizeReno, as extensive simulation experience has shown. In par-

ticular, assuming τ or c become large, at a fixed frequency ωthe term will be approximately

cNj

2

2τω

∗

,

which grows in magnitude, and has 90° of phase, and is thusdestabilizing. Similar conclusions happen when N is small.Note that instability for high τ is not surprising; however, hereit shows that Reno is not successful in scaling down gains byτ, as was suggested as a stabilizing method in the beginning ofthe section, based on the self-clocking property. Perhapsmore striking is the destabilizing effect of high capacity; asrouters become faster Reno is bound to go into an unstableregime. Intuitively, this is because higher capacity leads tohigher equilibrium rate. This produces higher gain in themultiplicative term of AIMD since sources update more fre-quently, on each acknowledgment, with a larger amplitude.

Simulation StudiesThe equilibrium model has been validated in [48] for Renoand in [12] for Vegas. In this section, we present simulationresults to validate our linear dynamic model when the sys-tem is stable or barely unstable. They also illustrate numeri-cally the stability region of Reno/RED.

We consider a single link of capacity c pkts/ms shared byN sources with identical round-trip propagation delay d ms.For N = 20 30 60, , ,… sources, capacity c = 8 9 15, , ,… pkts/ms,and propagation delay d = 50 55 100, , ,… ms, we examine theNyquist plot of the loop gain of the feedback system (L j( )ωin (30)). For each ( , )N c pair, we determine the delayd N cm( , ), at 5-ms increments, at which the smallest interceptof the Nyquist plot with the real axis is closest to −1. This isthe delay at which the system( , )N c transits from stability toinstability according to the linear model. For this delay, wecompute the critical frequency f N cm( , ) at which the phaseof L j( )ω is −π. Note that the computation of L j( )ω requiresequilibrium round-trip time τ, the sum of propagation delayd N cm( , ) and equilibrium queueing delay. The queueing de-lay is calculated from the equilibrium model. Hence, foreach ( , )N c pair that becomes barely unstable at a delay be-tween 50 ms and 100 ms, we obtain the critical (propaga-tion) delay d N cm( , ) and the critical frequency f N cm( , ).

We repeat these experiments in ns-2, using persistent FTPsources and RED with ECN marking. The RED parametersare (0.1, 40 pkts, 540 pkts,10 4− ). For each ( , )N c pair, we ex-amine the queue and window trajectories to determine thecritical delay d N cns( , ) when the system transits from stabil-ity to instability. We measure the critical frequency f N cns( , ),the fundamental frequency of queue oscillation, from theFFT of the queue trajectory. Thus, corresponding to the lin-ear model, we obtain the critical delay d N cns( , ) and fre-quency f N cns( , ) from simulations.

We compare model prediction with simulation. Fig. 2(a)plots the critical delay d N cm( , ) computed from the linear


100

95

90

85

80

75

70

65

60

55

50

Del

ay (

Mod

el)

50 55 60 65 70 75 80 85 90 95 100Delay (NS)

(a)

30 Data Points

30 Data Points

5

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

Freq

uenc

y (M

odel

)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Frequency (NS)

(b)

Static-Link Model

Dynamic-Link Model

Figure 2. Validation: comparison of critical (round-trippropagation) delay and critical frequency computed from linearmodel and measured from simulation (NS). (a) Critical delay, ms. (b)Critical frequency, Hz.

model versus the critical delay d N cns( , ) from packet-levelsimulations. Each data point (there are 30 of them) corre-sponds to a particular( , )N c pair. The dotted line is where allpoints should lie if the linear model agrees perfectly with thesimulation. Fig. 2(b) gives the corresponding plot for criticalfrequencies f N cm( , ) versus f N cns( , ). The agreement be-tween model and simulation seems quite reasonable (recallthat delay values have a resolution of 5 ms). Also shown inFig. 2(b) are critical frequencies predicted from the linearmodel if the link dynamics are ignored and congestion prob-ability is proportional to flow rate, p t y t( ) ( )= ρ , using thesame Nyquist plot method described above. It shows thatqueue dynamics is significant at the time scale of interest.

Fig. 3 illustrates the stability region implied by the linearmodel. For each N , it plots the critical delay d N cm( , ) versuscapacity c. The curve separates stable (below) from unsta-ble regions (above). The negative slope shows thatTCP/RED becomes unstable when delay or capacity is large.As N increases, the stability region expands, i.e., a small loadinduces instability. A larger delay or capacity, or a smallerload, leads to a larger equilibrium window; this confirms thefolklore that TCP behaves poorly at large window size.

A Scalable ControlThe above discussion suggests that the current protocolmay be ill-suited for future networks where both delay andcapacity can be large. This has motivated the search for pro-tocols that scale properly so as to maintain stability in thepresence of these variations. With regard to delay stability,scaling of source gains by 1/τ i was first suggested in [25] andsubsequently proved in [26]-[28] to provide stability for“primal” laws involving first-order source control and staticlink marking, provided delay or control gain is sufficientlysmall. The concept of scaling down gains by delay was al-ready mentioned in the context of optimization-based sta-bility proofs; when the scaling is done by a common globalconstant, this can be very conservative. In contrast, individ-ualized scaling as suggested here has the appealing featurethat sources with low round-trip times can respond quicklyand take advantage of available bandwidth, and it is onlythose sources (with long delays) whose fast response com-promises stability that must slow down.

In this section, we describe a protocol, developed in[48], that can be implemented in a decentralized way bysources and links and that satisfies some basic objectives:high network utilization in equilibrium and local stabilityfor arbitrary delays, capacities, and routing. These require-ments impose certain constraints on the linearized dynam-ics: integration at links and conditions on the gain atsources and links. We will present a global implementationby nonlinear algorithms at sources and links that are con-sistent with the linearization requirements. We will alsodiscuss signaling implications of this protocol and con-clude with a packet-level simulation that validates the the-oretical results.

Objectives and Linear DesignWe now lay out a series of objectives for the feedback con-trol laws in purely local (linearized) terms. These will leadus to a local control law, which is proved in [48] to achievethese objectives.

A first objective is that the target capacity cl is matchedat equilibrium; as in (13), this can be achieved when pricesintegrate the excess capacity (variables denote perturba-tions in this subsection):

�p yl l l= γ .

Is this the only choice? If exact tracking of capacity is de-sired, there must be an integrator in the loop; and as ex-plained in [48], to have stability, we must perform theintegration at the lower-dimensional end of the problem,i.e., at the links. So the above is the simplest law consistentwith this objective; the constant γ l will be chosen later.

The next, main objective is to have local dynamic sta-bility for arbitrary network delays, link capacities, androuting topologies.

We now argue more carefully for the desired scaling as afunction of delay. As remarked earlier, delays are the onlydynamics of the open loop; therefore, they are the onlyquantity that sets a time scale to our closed-loop behavior.Thus, we aim here for a system where a scaling of all delaysby a common factor would result in identical time responsesexcept for time scale. Consider first a single-link, sin-gle-source problem. The network delay and the above linkintegrator will yield a term

es

s− τ

in the loop-transfer function. Thinking, for example, interms of its Nyquist plot, instability will always occur at highvalues of τ unless the gain is made a function of τ. Indeed, in-


100

95

90

85

80

75

70

65

60

55

50

Del

ay[m

s]

8 9 10 11 12 13 14 15Capacity [pkts/ms]

N = 60N = 50

N = 40

N = 30

N = 20

Figure 3. Stability region: for each N, the region above the curveis unstable and the one below is stable.

troducing a gain K /τ in the loop (specifically at the source)leads to a loop gain

Ke

s

s− τ

τ,

which is scale invariant: namely, its frequency response isa function of τω, so Nyquist plots for all values of τ wouldfall on a single curve, and by choosing K appropriatelyone can ensure stability for all τ. Further, the time re-sponses of such a loop will give the desired invariance upto scale. Inspired by this, for the multilink, multisourcecase, we are led to include a factor1/τ i in the control gainat each source i.

Now we turn to invariance to capacity and routing. It iscrucial that the overall loop gain introduced by the routingmatrices Rf , Rb be kept under control; intuitively, as moresources or links participate in the feedback, the gain mustbe appropriately scaled down. The difficulty is implement-ing this in a decentralized fashion, without access to theglobal routing information. To scale down the gain due to Rf

at links, we exploit the fact that, at equilibrium, the aggre-gate source rates add up to capacity, so R x cf ( )0 ∗ = . Sincesources know their rates and links their capacities, one isled to introduce a gain 1/cl at each link and a gain xi

∗ at eachsource. To scale down the gain due to Rb at sources, we in-troduce a gain1/M i at each source, M i being the number ofbottleneck links in the source’s path.

Summarizing the above requirements in their simplestpossible form, we propose the following linear control laws:

• For the source, a static gain (from qi to xi) of

− = −∗

κ ατi

i i

i i

xM

: ,(31)

where α i is a gain parameter to be chosen, xi∗ is the

equilibrium rate, τ i is round-trip time, and M i is abound on the number of bottleneck links in source i’spath. The sign is used to provide negative feedbackfrom prices to rates.

• For the link, an integrator with gain normalized by(virtual) capacity

pc s

yll

l= 1.

(32)

Note that the normalization gives this price units oftime. In practice, cl is chosen to be strictly less thanthe actual link capacity in order to maintain zerobuffer in equilibrium. If cl were the actual link capacity,pl would represent the queueing delay at the link andis the price signal used in TCP Vegas (see (22)); sohere we can think of pl as a “virtual” queueing delay(see [16] for related ideas).

It is proved in [48] that, provided the routing matrix Rhasfull row rank, and the gains α i < 1, the feedback system withsource algorithm (31) and link algorithm (32) is linearly sta-ble for arbitrary delays, link capacities, and routing.

Global, Nonlinear ImplementationWe now describe a global implementation by links andsources that have suitable equilibrium points, around whichthe linearization satisfies the requirements laid out above.

The price dynamics can be implemented by the link al-gorithm

�( ( ) ), ( ) ;

[ ( ) ] , ( )p c

y t c p t

cy t c p t

ll

l l l

ll l l

=− >

− +

10

1

if

if =

0.

That is, prices integrate excess capacity in a normalizedway and are saturated to be always nonnegative. At equilib-rium, bottlenecks with nonzero price will have y cl l= as re-quired. Nonbottlenecks with y cl l< will have zero price.

For the sources, the linearization requirement (31) leadsto a differential equation

∂∂

= −fq

f qM

i

i

i i i

i i

ατ

( ),

which can be solved analytically and gives the control law

x f q x ei i i i

q

Mi i

i i= =−

( ): ,max

ατ

(33)

as the static source law. Note that the smaller the delay, themore responsive a source is in varying its rate as a functionof price. The larger the delay, the more conservative thesource control is, in order to avoid instability.

Here, x imax , is a maximum rate parameter, which can varyfor each source, and in fact can also be scheduled to dependon M i, τ i. All that is required is that it does not depend onqi.For instance, x imax , can be chosen to compensate for the ef-fect of delay τ i on equilibrium rate.

The utility function corresponding to the source con-trol is

U xM

xx

xx xi

i i

i i

( ) log ,,

,= −

≤τα

1max

maxfor i.

The above is the simplest nonlinear source law that givesthe required scaling; as discussed in [48], more degrees offreedom are available by letting α i be a function of qi; how-ever, it is important to emphasize that the stability require-ments do pose constraints on the family of utility functions.



Packet-Level ImplementationRequirementsWe briefly discuss here the information needed at sourcesand links to implement the dynamic laws we defined and theresulting communication requirements.

Links must have access to their aggregate flow yl ; thisquantity is not directly known but can be estimated from ar-riving packets (see, e.g., [45]). Note also that since the rate isintegrated, smoothing is already built into the process. Thetarget capacity cl is assumed known. Indeed, a simple way toimplement the price updates is to maintain a “virtual queue”counter that is incremented with received packets and dec-remented at the virtual capacity rate. Then prices are ob-tained by dividing this counter by the virtual capacity.

Sources must have access to the round-trip time τ i, whichcan be obtained by timing packets and their acknowledg-ments. Note that since the target equilibrium state is withempty queues, at equilibrium we will have τ i id∗ = ; it is there-

fore recommended to use an estimate of di (typically theminimum observed round-trip time), as is done in Vegas.This avoids the possibility that temporary excursions of thereal queue (inevitable at the packet level) would have adestabilizing effect via the round-trip time.

Also, sources must receive two parameters from the net-work: the aggregate price qi and the number of bottlenecksM i. To communicateqi, the technique of random exponentialmarking [45] can be used. Here, an ECN bit would be markedat each link l with probability 1 1− φ φ >− p l , . Assuming inde-pendence, the overall probability that a packet from source igets marked is1 − φ− q i , and thereforeqi can be estimated frommarking statistics at each source. This requires the knowl-edge of φ, which is presumably a global constant set a priori.Alternatively, packet dropping can be used in lieu of marking;provided drop probabilities are low, the superposition ofprobabilities holds to a good approximation.

Regarding M i, in the simplest implementation one wouldemploy an upper bound, which could be a globally known

20

18

16

14

12

10

8

6

4

2

00

0

5

5

10

10

15

15

Time [s](a)

Time [s](c)

Time [s](b)

Time [s](d)

Indi

vidu

alW

indo

w [p

kts]

Indi

vidu

alW

indo

w [p

kts]

500

450

400

350

300

250

200

150

100

50

00

0

2

2

4

4

6

6

8

8

10

10

12

12

14

14

Inst

anta

neou

s Q

ueue

[pkt

s]In

stan

tane

ous

Que

ue [p

kts]

160

140

120

100

80

60

40

20

0

7000

6000

5000

4000

3000

2000

1000

0

Figure 4. Individual window and queue traces. Simulation parameters: 50 sources, capacity = 9 pkts/ms,α = 0 8. , virtual capacity = 95%.(a) Individual window (delay = 40 ms). (b) Queue (delay = 40 ms). (c) Individual window (delay = 200 ms). (d) Queue (delay = 200 ms).


constant or based on the total number of links in theroute, found, for example, from traceroute informationin IP. For a more aggressive control one would need tocommunicate, in real time, the number of bottlenecklinks. This can be done analogously to what is done withprices, using a second ECN bit. Assume that links thatare bottlenecks mark this bit with probability1 1− φ− , andthose that are not do not mark. Then the probability ofmarking on a route is1 − φ− M i , which allows for the esti-mation of M i at the sources in real time. This introducesa time-varying source law whose closed-loop stabilityrequires further study.

Once the relevant parameters are measured or esti-mated, the static exponential law in (33) can be used todetermine the desired rate. Multiplying by the mea-sured round-trip time gives the desired window (after around-off) which can then be used directly to controltransmission. Thus, we have an implementation sce-nario that only adds a moderate amount of overhead tothe current Internet protocols.

Simulation StudiesA packet-level implementation has been completed us-ing the parallel simulator Parsec [50]. The simulation in-cludes window management, link queueing, and delay,but at this point does not include marking; prices arecommunicated as floating-point numbers. We simulate asingle link with capacity 9 pkts/ms that is shared by 50sources. Fig. 4 shows the individual window and queueas a function of time when the round-trip propagationdelay is 40 ms and 200 ms, respectively. As expected,both converge regardless of delay. Longer delay sets alarger time scale for the closed-loop behavior.

AcknowledgmentsThe simulation results are taken from [47] and con-ducted by Sachin Adlakha and Jiantao Wang.

References[1] L. Benmohamed and S.M. Meerkov, “Feedback control of congestionin store-and-forward networks: The case of a single congested node,”IEEE/ACM Trans. Networking, vol. 1, pp. 693-707, Dec. 1993.

[2] S. Chong, R. Nagarajan, and Y.-T. Wang, “Designing stable ABR flowcontrol with rate feedback and open loop control: First order controlcase,” Perform. Eval., vol. 34, no. 4, pp. 189-206, Dec. 1998.

[3] E. Altman, T. Basar, and R. Srikant, “Congestion control as a stochas-tic control problem with action delays,” Automatica, Dec. 1999.

[4] H. Ozbay, S. Kalyanaraman, and A. Iftar, “On rate-based congestioncontrol in high-speed networks: Design of an h∞ based flow controller forsingle bottleneck,” in Proc. Amer. Control Conf., 1998.

[5] S. Mascolo, “Congestion control in high-speed communication net-works using the Smith principle,” Automatica, vol. 35, no. 12, pp.1921-1935, Dec. 1999.

[6] E.J. Hernandez-Valencia, L. Benmohamed, R. Nagarajan, and S.Chong, “Rate control algorithms for the ATM ABR service,” Eur. Trans.Telecommun., vol. 8, pp. 7-20, 1997.

[7] R. Srikant, “Control of communication networks,” in Perspectives inControl Engineering: Technologies, Applications, New Directions, T.Samad, Ed. Piscataway, NJ: IEEE Press, 2000, pp. 462-488.

[8] F.P. Kelly, A. Maulloo, and D. Tan, “Rate control for communicationnetworks: Shadow prices, proportional fairness and stability,” J. Oper.Res. Soc., vol. 49, no. 3, pp. 237-252, Mar. 1998.

[9] F.P. Kelly. (July 1999). Mathematical modelling of the Internet, in Proc.4th Int. Congr. Industrial Applied Mathematics. Available: http://www.statslab.cam.ac.uk/~frank/mmi.html

[10] S.H. Low and D.E. Lapsley, “Optimization flow control, I: Basic algo-rithm and convergence,” IEEE/ACM Trans. Networking, vol. 7, pp.861-874, Dec. 1999. Available: http://netlab.caltech.edu

[11] S.H. Low. (Sept. 18-20, 2000). A duality model of TCP flow controls, inProc. ITC Specialist Seminar on IP Traffic Measurement, Modeling andManagement. Available: http://netlab.caltech.edu

[12] S.H. Low, L. Peterson, and L. Wang. (June 2001). Understanding Ve-gas: A duality model, in Proc. ACM Sigmetrics . Available:http://netlab.caltech.edu/ pub.html

[13] S. Athuraliya and S.H. Low, “Optimization flow control with New-ton-like algorithm,” J. Telecommun. Syst., vol. 15, no. 3/4, pp. 345-358,2000.

[14] F. Paganini. (2001). On the stability of optimization-based flow control,in Proc. Amer. Control Conf. Available: http://www.ee.ucla.edu/-paganini/PS/remproof.ps

[15] S. Kunniyur and R. Srikant. (Mar. 2000). End-to-end congestion con-trol schemes: Utility functions, random losses and ECN marks, in Proc.IEEE Infocom. Available: http://www.ieee-infocom.org/2000/pa-pers/401.ps

[16] S. Kunniyur and R. Srikant. (Apr. 2001). A time-scale decompositionapproach to adaptive ECN marking, in Proc. IEEE Infocom. Available:http:// comm.csl.uiuc.edu:80/~srikant/pub.html

[17] J. Mo and J. Walrand, “Fair end-to-end window-based congestioncontrol,” IEEE/ACM Trans. Networking, vol. 8, no. 5, pp. 556-567, Oct.2000.

[18] J. Mo, R. La, V. Anantharam, and J. Walrand, “Analysis and compari-son of TCP Reno and Vegas,” in Proc. IEEE Infocom. Mar. 1999.

[19] R. La and V. Anantharam. (Mar. 2000). Charge-sensitive TCP and ratecontrol in the Internet, in Proc. IEEE Infocom. Available:http://www.ieee-infocom.org/2000/papers/401.ps

[20] K. Kar, S. Sarkar, and L. Tassiulas, “Optimization based rate controlfor multirate multicast sessions,” in Proc. IEEE Infocom., Apr. 2001.

[21] V. Misra, W.-B Gong, and D. Towsley, “Fluid-based analysis of a net-work of AQM routers supporting tcp flows with an application to RED,”in Proc. ACM SIGCOMM, 2000.

[22] C. Hollot, V. Misra, D. Towsley, and W.-B. Gong. (Apr. 2001). A controltheoretic analysis of RED, in Proc. IEEE Infocom. Available:http://www-net.cs. umass.edu/papers/papers.html

[23] F. Paganini, “Flow control via pricing: a feedback perspective,” inProc. 2000 Allerton Conf., Oct. 2000.

[24] V. Jacobson. (Aug. 1988). Congestion avoidance and control, in Proc.SIGCOMM’88, ACM. Available: ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

[25] F.P. Kelly. (Dec. 1999). Models for a self-managed Internet, in Proc. R.Soc. Meeting. Available: http://www.statslab.cam.ac.uk/~frank/smi.html

[26] R. Johari and D. Tan, “End-to-end congestion control for theInternet: Delays and stability,” Cambridge Univ., Cambridge, U.K., Cam-bridge Univ. Statistical Laboratory Research Report, Tech. Rep. 2000-2,2000.

[27] L. Massoulie, “Stability of distributed congestion control with heter-ogeneous feedback delays,” Microsoft Research, Cambridge, U.K., Tech.Rep. TR 2000-111, 2000.

[28] G. Vinnicombe, “On the stability of end-to-end congestion control for theInternet,” Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.398, Dec. 2000.

[29] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson, “On the self-similarnature of Ethernet traffic,” IEEE/ACM Trans. Networking, vol. 2, pp. 1-15, 1994.

[30] V. Paxson and S. Floyd, “Wide-area traffic: The failure of Poisson model-ing,” IEEE/ACM Trans. Networking, vol. 3, pp. 226-244, 1995.

[31] W. Willinger, M.S. Taqqu, R. Sherman, and D.V. Wilson, “Self-similaritythrough high variability: Statistical analysis of Ethernet LAN traffic at thesource level,” IEEE/ACM Trans. Networking, vol. 5, pp. 71-86, 1997.

[32] M.E. Crovella and A. Bestavros, “Self-similarity in World Wide Web traffic:Evidence and possible causes,” IEEE/ACM Trans. Networking, vol. 5, pp.835-846, 1997.

[33] X. Zhu, J. Yu, and J.C. Doyle, “Heavy tails, generalized coding, and optimalweb layout,” in Proc. IEEE Infocom., Apr. 2001.

[34] L.S. Brakmo and L.L. Peterson, “TCP Vegas: End to end congestion avoid-ance on a global Internet,” IEEE J. Select. Areas Commun., vol. 13, pp.1465-1480, Oct. 1995. Available: http://cs.princeton.edu/nsg/papers/jsac-vegas.ps

[35] S. Floyd and V. Jacobson, “Random early detection gateways for conges-tion avoidance,” IEEE/ACM Trans. Networking, vol. 1, pp. 397-413, Aug. 1993.Available: ftp://ftp.ee.lbl.gov/papers/early.ps.gz

[36] G. de Veciana, T.J. Lee, and T. Konstantopoulos, “Stability and perfor-mance analysis of networks supporting elastic services,” IEEE/ACM Trans.Networking, vol. 9 , Feb. 2001.

[37] F. Baccelli and D. Hong, “AIMD, fairness and fractal scaling of TCP traffic,”INRIA, Paris, France, Tech. Rep. RR 4155, 2001.

[38] F.P. Kelly, “Charging and rate control for elastic traffic,”Eur. Trans. Telecommun.,vol.8,pp.33-37,1997.Available:http://www.statslab.cam.ac.uk/~frank/elastic.html

[39] H. Yaiche, R.R. Mazumdar, and C. Rosenberg, “A game theoretic frame-work for bandwidth allocation and pricing in broadband networks,”IEEE/ACM Trans. Networking, vol. 8, pp. 2-14, Oct. 2000.

[40] T.V. Lakshman and U. Madhow, “The performance of TCP/IP for networkswith high bandwidth-delay products and random loss,” IEEE/ACM Trans. Net-working, vol. 5, pp. 336-350, June 1997. Available: http://www.ece.ucsb.edu/Fac-ulty/ Madhow/Publications/ton97.ps

[41] M. Mathis, J. Semke, J. Mahdavi, and T. Ott, “The macroscopic behaviorof the TCP congestion avoidance algorithm,” ACM Comput. Commun. Rev.,vol. 27, no. 3, July 1997. Available: http://www.psc.edu/networking/papers/model sub ccr97.ps

[42] L. Massoulie and J. Roberts. (Mar. 1999). Bandwidth sharing: Objectivesand algorithms, in Infocom’99. Available: http://www.dmi.ens.fr/~mistral/tcpworkshop.html

[43] G. Vinnicombe, “A new TCP with guaranteed network stability,” Cam-bridge Univ., Cambridge, U.K., Technical report, preprint, June 2001.

[44] S. Floyd, “Connections with multiple congested gateways inpacket-switched networks, Part I: One-way traffic,” Comput. Commun. Rev.,vol. 21, no. 5, Oct. 1991.

[45] S. Athuraliya, V.H. Li, S.H. Low, and Q. Yin, “REM: Active queue manage-ment,” IEEE Network, vol. 15, pp. 48-53, May/June 2001. Extended version inProc. ITC17, Salvador, Brazil, Sept. 2001. Available: http://netlab.caltech.edu

[46] C. Hollot, V. Misra, D. Towsley, and W.-B. Gong. (Apr. 2001). On designingimproved controllers for AQM routers supporting TCP flows, in Proc. IEEEInfocom. Available: http://www-net.cs.umass.edu/papers/papers.html

[47] S.H. Low, F.Paganini, J.Wang, S.A. Adlakha, and J.C. Doyle, “Linear stabil-ity of TCP/RED and a scalable control,” in Proc. 39th Annual Allerton Conf.Communication, Control, and Computing, Monticello, IL, Oct. 2001. Available:http://netlab.caltech.edu

[48] S. Athuraliya and S.H. Low, “An empirical validation of a duality model ofTCP and queue management algorithms,” in Proc. Winter Simulation Conf.,Dec. 2001.

[49] F. Paganini, J.C. Doyle, and S.H. Low. (Dec. 2001). Scalable laws for stablenetwork congestion control, in Proc. Conf. Decision and Control. Available:http://www.ee.ucla.edu/~paganini

[50] Parallel simulation environment for complex systems. Available:http://pcl.cs.ucla.edu/projects/parsec/

Steven H. Low received his B.S. from Cornell in 1984 and hisPh.D. from Berkeley in 1992, both in electrical engineering.He was a consultant to NEC in 1991, with AT&T Bell Labora-tories, Murray Hill, from 1992 to 1996, and with the Univer-sity of Melbourne, Australia, from 1996 to 2000. He is now anAssociate Professor at California Institute of Technology.His research interests are in the control and optimization ofcommunications networks and protocols. He was a co-re-cipient of the IEEE William R. Bennett Prize Paper Award in1997 and received the 1996 R&D 100 Award. He is on the edi-torial board of IEEE/ACM Transactions on Networking, andhe has been a guest editor of the IEEE Journal on SelectedAreas in Communications.

Fernando Paganini received his Ingeniero Electricista andLicenciado en Matematica degrees from the Universidad dela Republica, Montevideo, Uruguay, in 1990, and his M.S. andPh.D. degrees in electrical engineering from the CaliforniaInstitute of Technology, Pasadena, in 1992 and 1996, respec-tively. From 1996 to 1997, he was a Postdoctoral Associate atthe Massachusetts Institute of Technology. Since 1997 hehas been with the Electrical Engineering Department at theUniversity of California, Los Angeles, where he is currentlyAssociate Professor. His research interests are robust con-trol, distributed control, and applications to communica-tion networks and power systems. He was the recipient ofthe O. Hugo Schuck award for best paper at the 1994 Ameri-can Control Conference, the Wilts and Clauser Prizes for hisPh.D. Thesis at Caltech in 1996, the 1999 NSF CAREERAward, and the 1999 Packard Fellowship.

John C. Doyle is Professor of Electrical Engineering andControl and Dynamical Systems at California Institute ofTechnology. He has a B.S. and an M.S. in electrical engineer-ing from MIT and a Ph.D. in mathematics from the Universityof California, Berkeley. His current research interests are intheoretical foundations for complex networks, primarily inengineering and biology, and the interplay between robust-ness, feedback, control, dynamical systems, computation,communication, and statistical physics. Additional inter-ests are in theoretical foundations of multiscale physics andfinancial markets. Awards include the IEEE Centennial Out-standing Young Engineer in 1984, the 1976 IEEE Hickernell,the 1983 American Automatic Control Council (AACC)Eckman, and the 1984 Bernard Friedman. Prize paperawards include the IEEE Baker, the IEEE Transactions on Au-tomatic Control Axelby (twice), and the AACC Schuck.


internet congestion control - authors.library.caltech.edu · various flow control mechanisms, ......

Documents