taming traffic dynamics: analysis and improvements

Computer Communications 35 (2012) 565–578

Contents lists available at ScienceDirect

Computer Communications

journal homepage: www.elsevier .com/locate /comcom

Taming traffic dynamics: Analysis and improvements

Pedro Casas a,b, Federico Larroca b,c,*, Jean-Louis Rougier c, Sandrine Vaton a

a Télécom Bretagne, Technopôle Brest-Iroise, CS 83818, 29238 Brest Cedex 3, Franceb Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, C.P. 11.300, Montevideo, Uruguayc Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, Paris, France

a r t i c l e i n f o

Article history:Available online 29 July 2010

Keywords:Traffic uncertaintyTraffic managementRobust optimizationRobust RoutingDynamic Load Balancing

0140-3664/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.comcom.2010.07.009

* Corresponding author. Present Address: Facultadla República, Julio Herrera y Reissig 565, C.P. 11.300+598 2 711 09 74; fax: +598 2 711 74 35.

E-mail addresses: [email protected], [email protected] (F. Larroca)fr (J.-L. Rougier), [email protected]

a b s t r a c t

Internet traffic is highly dynamic and difficult to predict in current network scenarios, which enormouslycomplicates network management and resources optimization. To address this uncertainty in a robustand efficient way, two almost antagonist Traffic Engineering (TE) techniques have been proposed inthe last years: Robust Routing and Dynamic Load Balancing. Robust Routing (RR) copes with traffic uncer-tainty in an off-line preemptive fashion, computing a single static routing configuration that is optimizedfor traffic variations within some predefined uncertainty set. On the other hand, Dynamic Load Balancing(DLB) balances traffic among multiple paths in an on-line reactive fashion, adapting to traffic variations inorder to optimize a certain congestion function. In this article we present the first comparative studybetween these two alternative methods. We are particularly interested in the performance loss of RRwith respect to DLB, and on the response of DLB when faced with abrupt changes. This study bringsinsight into several RR and DLB algorithms, evaluating their virtues and shortcomings, which allows usto introduce new mechanisms that improve previous proposals.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

As network services and Internet applications evolve, networktraffic is becoming increasingly complex and dynamic. The conver-gence of data, telephony and television services on an all-IP net-work directly translates into a much higher variability andcomplexity of the traffic injected into the network. To make mat-ters worse, the presence of unexpected events such as networkequipment failures, large-volume network attacks, flash crowdoccurrences and even external routing modifications induces largeuncertainty in traffic patterns. Moreover, current evolution anddeployment-rate of broadband access technologies (e.g. Fiber ToThe Home) only aggravates this uncertainty.

But these are not the only problems network operators are con-fronted with. The ever-increasing access rates available for end-users we just mentioned is such that the assumption of infinitelyprovisioned core links could soon become obsolete. In fact, recentInternet traffic studies from major network technology vendorslike Cisco Systems forecast the advent of the Exabyte era [1,2], amassive increase in network traffic driven by high-definition video.In this context, simply upgrading link capacities may no longer be

ll rights reserved.

de Ingeniería, Universidad de, Montevideo, Uruguay. Tel.:

u (P. Casas), federico.larroca@, rougier@telecom-paristech.(S. Vaton).

an economically viable solution. Moreover, even if overdimension-ing would be possible, its environmental impact is not negligible.For instance, the Information and Communication Technologysector alone is responsible for around 2% of the man-made CO2, asimilar figure to that of the airline industry, but with higherincreasing perspectives [3]. An efficient and responsible usage ofthe resources is then essential.1

In the light of this traffic scenario, we study the problem of int-radomain Traffic Engineering (TE) under traffic uncertainty. Thisuncertainty is assumed to be an exogenous traffic modification,meaning that traffic variations are not produced within the domainfor which routing is optimized but are due to external and difficultto predict events. More in particular, we are interested in twoalmost antagonist approaches that have emerged in the recentyears to cope with both the increasing traffic dynamism and theneed for cost-effective solutions: Robust Routing (RR) [6–8] andDynamic Load Balancing (DLB) [9–11].

In RR, traffic uncertainty is taken into account directly withinthe routing optimization, computing a single routing configurationfor all traffic demands within some uncertainty set where traffic isassumed to vary. This uncertainty set can be defined in differentways, depending on the available information: largest values oflinks load previously seen, a set of previously observed trafficdemands (previous day, same day of the previous week), etc. The

1 To learn more about this emerging discipline, the interested reader shouldconsult works related to so-called ‘‘green networking” [4].

http://dx.doi.org/10.1016/j.comcom.2010.07.009

mailto:[email protected]

mailto:federico.larroca@


mailto:rougier@telecom-paristech.


http://dx.doi.org/10.1016/j.comcom.2010.07.009

http://www.sciencedirect.com/science/journal/01403664

http://www.elsevier.com/locate/comcom

566 P. Casas et al. / Computer Communications 35 (2012) 565–578

criterion to search for this unique routing configuration is gener-ally to minimize the maximum link utilization (i.e. the utilizationof the most loaded link in the network) for all traffic demands ofthe corresponding uncertainty set. While this routing configurationis not optimal for any single traffic demand within the set, it min-imizes the worst case performance over the whole set.

DLB copes with traffic uncertainty and variability by splittingtraffic among multiple paths in real-time. In this dynamic scheme,each origin–destination (OD) pair of nodes within the network isconnected by several a priori configured paths, and the problemis simply how to distribute traffic among these paths in order tooptimize a certain function. DLB is generally defined in terms ofa link-congestion function, where the portions of traffic are ad-justed in order to minimize the total network congestion. Ideally,the traffic distribution is set so that at every instant the objectivefunction is optimized.

Those who promote DLB highlight among others the fact that itis the most resource-efficient possible scheme, and that given theconfigured paths it supports every possible traffic demand, all ofthis in an automated and decentralized fashion. In practice, the ‘‘al-ways-optimized” characteristic we mentioned above is achievedby means of a distributed algorithm periodically executed by everyingress router based on feedback from the network. It is preciselythis last characteristic that constitutes the most challenging aspectof DLB. In fact, the deployment of DLB has been, to say the least,limited. Two particular problems arise in DLB: convergence tothe optimum is not always guaranteed, and convergence speedmight be over-killing under large and abrupt changes in traffic de-mands. Network operators are reluctant to use dynamic mecha-nisms mainly because they are afraid of a possible oscillatorybehavior of the algorithm used by each OD pair to adjust Load Bal-ancing. As the early experiences in ArpaNet has proved [12], theseconcerns are not without reason. (In particular, before July 1987,the links’ metric was defined as the packet delay averaged over a10 s period. Although this adaptive routing scheme worked cor-rectly under light or moderate loads, it generated oscillations un-der relatively heavy loads. This resulted in substituting the links’metric by a fixed value as we use it today, sacrificing optimalityfor stability.) Indeed, for these adaptive and distributed algorithms,a trade-off between adaptability (convergence speed) and stabilitymust be found, which may be particularly difficult in situationswhere abrupt traffic changes occur.

Those who advocate the use of RR claim that there is actually noneed to implement supposedly complicated and possibly oscilla-tory dynamic routing mechanisms, and that the incurred perfor-mance loss for using a single routing configuration is negligiblewhen compared with the increase in complexity. RR provides a sta-ble routing configuration for all the traffic demands within theuncertainty set, avoiding possible oscillations and convergence is-sues. However, RR presents some conception problems and seriousshortcomings in its current state which we highlight and try toease in this work. The first drawback of current RR is related tothe objective function it intends to minimize. Optimization underuncertainty is generally more complex than classical optimization,which forces the use of simpler optimization criteria such as max-imum link utilization (MLU). The MLU is not the most suitable net-work-wide optimization criterion; setting the focus too strictly onMLU often leads to worse distribution of traffic, adversely affectingthe mean network load and thus the total network end-to-end de-lay, an important QoS indicator. It is easy to see that the minimiza-tion of the MLU in a network topology with heterogeneous linkcapacities may lead to poor results as regards global network per-formance. The second drawback of RR we identify is its inherentdependence on the definition of the uncertainty set of traffic de-mands: the uncertainty set has to be sufficiently ‘‘large” to allowtraffic flexibility and to provide performance guarantees, but

should not be excessively ‘‘large” to avoid wasting network re-sources. Thus, considering an unique RR configuration to addressboth traffic in normal operation and unexpected traffic variationsis an inefficient strategy, as a single routing configuration cannotbe suitable for both situations.

1.1. Contributions of this article

This article presents a fair and comprehensive comparativeanalysis between RR and DLB mechanisms. The analysis is compre-hensive as it evaluates the performance of both mechanisms basedon different performance indicators and considering normal oper-ation as well as unpredicted traffic events. We believe our compar-ison is fair because it considers the particular characteristics ofeach mechanism under the same network and traffic conditions.To date and to the best of our knowledge this is the first work thatconducts such a comparative evaluation, necessary indeed not onlyfrom a research point of view but also for network operators whoseek cost-effective and robust solutions to face future network sce-narios. Based on this comparative analysis we develop and evalu-ate new variants of RR and DLB mechanisms, improving some ofthe shortcomings found in both static and dynamic approaches.

Regarding the RR approach, we will introduce some modifica-tions that strive to alleviate the two problems identified in currentproposals. We will first study which is the best objective functionto minimize, and propose the mean link utilization instead of theMLU. The mean link utilization provides a better image of net-work-wide performance, as it does not depend on the particularload or capacity of each single link in the network but on the aver-age value. However, a direct minimization of the mean link utiliza-tion does not assure a bounded MLU, which is not practical from anoperational point of view. Thus, we minimize the mean link utili-zation while bounding the MLU by a certain utilization thresholda priori defined. This adds a new, and maybe difficult to set, con-straint to the problem, namely how to define this utilizationthreshold. We further improve our proposal by providing a multi-ple objective optimization criterion, where both the MLU and themean link utilization are minimized simultaneously. We evaluatethe improvements of our proposals from a QoS perspective, usingthe mean path end-to-end queuing delay as a measure of globalperformance.

The second problem we address in RR is the trade-off betweenrouting performance and routing reliability. In [13] we have re-cently proposed a solution to manage this trade-off, known asReactive Robust Routing (RRR). Basically, RRR consists of construct-ing a RR configuration for expected traffic in nominal operation,adapting this nominal routing configuration after the detectionand localization of a large and long-lived traffic modification.RRR provides good performance for both nominal operation andunexpected traffic, but it is difficult to deploy in a real implemen-tation, because of the routing reconfiguration step. Reconfiguringthe routing of an entire Autonomous System is a nontrivial task.In this article we modify the RRR approach, using a preemptiveLoad Balancing algorithm to balance traffic among pre-establishedpaths after the localization of a large volume traffic modification(preemptive in the sense of preventing a situation from occurring).

In what respects DLB, we evaluate the use of so-called no-regretalgorithms as the distributed optimization algorithm used by in-gress routers to adapt Load Balancing. The authors of a recent pa-per [14] proved that if all OD pairs use algorithms of this kind,convergence to the optimum is guaranteed. Special attention willbe paid on the behavior of the algorithm when faced with abruptand unexpected changes in the traffic demands. We shall introducesimple, and yet effective, modifications to the algorithm to assure afast convergence to the new optimum in this case.

P. Casas et al. / Computer Communications 35 (2012) 565–578 567

As we shall see in the following subsection, several previouslyproposed DLB algorithms strive to minimize the MLU by meansof a greedy algorithm in the paths utilization (i.e. each ingress rou-ter increases the amount of traffic sent along the path with thesmallest utilization). As proved in a recent paper [15], convergenceto the optimum for such algorithms may not be guaranteed, in thesense that they may converge to a situation in which the MLU isnot minimized. In fact, we shall present an example in which thedifference with the optimum of the MLU is non-negligible. In thiswork we will present an alternative path cost function, so thatgreedy algorithms that use it do converge to the optimumequilibrium.

1.2. Related work

There is a large literature on routing optimization with uncer-tain traffic demands. Thus, here we shall only mention a few pa-pers, and do not expect our list to be exhaustive.

Traditional algorithms rely on a single or a small group of ex-pected traffic demands to compute optimal and reliable routingconfigurations. An extreme case is presented in [16], where routingis optimized for a single estimated traffic demand and is then ap-plied for daily routing. Traffic uncertainty is characterized by mul-tiple traffic demands in [17] (set of traffic demands from previousday, same day of previous week, etc.), where different mechanismsto find optimal routes for the set are presented. As discussed for in-stance in [18], this perspective is no longer suitable for current andfuture dynamic scenarios. These approaches require a ‘‘leap offaith” to perform well, mainly because they assume that traffic pat-terns do not change that much over time. However, even a rela-tively small difference between the ‘‘real” traffic demand and theone used for the routing optimization may lead to an importantperformance degradation. Such a difference may arise in the eventof unexpected traffic variations (which are more common nowa-days), or even be due to an error in the traffic estimation.

A different approach has emerged in the recent years to copewith the increasing traffic dynamism and the need for cost-effec-tive solutions, Dynamic Load Balancing (DLB) [9–11,19]. In DLB,traffic is split among a priori established paths in order to avoidnetwork congestion. The two most well-known proposals in thisarea are MATE and TeXCP. In MATE [9], a convex link congestionfunction is defined, which depends on the link capacity and thelink load. The objective is to minimize the total network conges-tion, for which a simple gradient descent method is proposed. In[19], we propose to use a link congestion function based on mea-surements of the queueing size, which results in better global per-formance from a QoS perspective. TeXCP [10] proposes asomewhat simpler objective: in order to minimize the MLU, theyminimize the biggest utilization each traffic demand obtains inits paths. Another DLB scheme which has the same objective buta relatively different mechanism is REPLEX [11].

The last category of algorithms consists of Robust Routing tech-niques [6–8,20,21]. The objective in RR is to find an unique staticrouting configuration that fulfills a certain criterion for a broadset of traffic demands, generally the one that minimizes the max-imum link utilization over the whole set of demands. In [6],authors capture traffic variations by introducing a polyhedral setof demands, which allows for easier and faster linear optimization.This robust technique is applied in [20] to compute a robust MPLSrouting configuration without depending on traffic demand esti-mation, and corresponding methods for robust OSPF optimizationare discussed. Oblivious Routing [7] also defines linear algorithmsto optimize worst-case MLU for different sizes of traffic uncertaintysets. The author of [21] analyzes the use of Robust Routing througha combination of traffic estimation techniques and its correspond-ing estimation error bounds, in order to shrink the set of traffic de-

mands. In [8] authors introduce COPE, a RR mechanism thatoptimizes routing for predicted demands and bounds worst-caseMLU to ensure acceptable efficiency under unexpected trafficevents. The idea behind COPE is similar to ours, in the sense thatit strives to alleviate performance degradation due to unforeseentraffic modifications. Nevertheless, it proposes a single routingconfiguration to handle expected as well as large and abrupt trafficvariations, which is clearly not the best solution.

The same paper [8] presents, to the best of our knowledge, theonly previous comparative study between RR and DLB. The authorsof the paper compare the performance of COPE with a dynamic ap-proach which they claim models the behavior of mechanisms suchas MATE and TeXCP. Given a time series traffic demands, this dy-namic approach consists of computing an optimal routing for eachtraffic demand i and evaluate its performance with the followingtraffic demand i + 1. There are two important shortcomings of thisDLB simulation. Firstly, adaptation in DLB is iterative and neverinstantaneous. Secondly, in all DLB mechanisms paths are set a pri-ori and remain unchanged during operation. This is not the case intheir dynamic approach, where each new routing optimizationmay change not only traffic portions but paths themselves. Forthese reasons, we believe that the comparison provided in [8] isbiased against dynamic schemes.

The remainder of this article is organized as follows. In Section 2we introduce the network model and notation, while Sections 3and 4 introduce a preliminary version of the RR and DLB mecha-nisms. Some first results are discussed in Section 5. Section 6presents new variants to the former mechanisms which alleviatethe shortcomings detected. The evaluation of the complete set ofalgorithms under different traffic scenarios is conducted in Section7. We finally draw conclusions of this comparative analysis inSection 8.

2. Network model and performance indicators

Let us begin by introducing the notation used in this article. Thenetwork topology is defined by n nodes and a set L = {l1, . . . , lq} of qlinks, each with a corresponding capacity ci, i = 1, . . . , q. The TrafficMatrix (TM) X = {xi,j} denotes the traffic demand (expressed in, forexample, Mbps) between every origin node i and every destinationnode j (i – j) of the network; we shall note each of these origin–destination pairs as OD pairs, and each origin–destination trafficdemand xi,j as OD flows. In practice, each traffic demand is mea-sured every T minutes (usually 50 or 100 [5]), and its value simplycorresponds to the cumulative number of bytes observed betweentwo consecutive measurements, divided by the polling time T. LetX = {xk} be the vector representation of the TM, where we havereordered OD flows by index k = 1, . . . , m (m = n�(n � 1)). LetN = {OD1, . . . ,ODm} be the set of m OD pairs. We consider a multi-path network topology, where each OD flow xk can be arbitrarilysplit among a set of pk origin–destinations paths Pk. In this sense,we shall call rk

p the portion of traffic flow xk sent along path p 2 Pk,where 0 6 rk

p 6 1 andP

p2Pkrk

p ¼ 1.Let kp

l be an indicator variable that takes value 1 if path p tra-verses link l and 0 otherwise, and Y = {q1, . . . ,qq} a vector represen-tation of links traffic load. Then X and Y are related through therouting matrix R, a q �m matrix R ¼ rk

l

� �where rk

l ¼P

p2Pkkp

l � rkp.

The variable rkl indicates the fraction of OD flow xk routed along

link l; this results in the following relation:

Y ¼ R � X ð1Þ

Given X, the multi-path routing optimization problem consists inchoosing the set of paths Pk for each OD pair k and computing therouting matrix R, in order to optimize a certain objective functiong(X,R). A simplified version of this problem is the Load Balancing

Fig. 1. The uncertainty set X as a polytope.

Table 1The Robust Routing Optimization Problem (RROP).

minimize umax

subject to :Xk2N

Xp2Pk

kpl :r

kp:xðkÞ 6 umax:cl 8l 2 L; 8X 2 X

Xp2PðkÞ

rkp ¼ 1 8k 2 N

rkp P 0 8p 2 Pk; 8k 2 N

umax 6 1


optimization problem which, given a set of paths, calculates R. Inthis work we shall consider different performance indicators, whichresult in different objective functions.

A very important link-level performance indicator is the linkutilization ul = ql/cl; a value of ul close to one indicates that the linkis operating near its capacity. Network operators usually prefer tokeep links utilization relatively low in order to support suddentraffic increases and link/node failures. A network-wide perfor-mance indicator is the maximum link utilization umax:

umaxðX;RÞ ¼maxl2Lfulg ð2Þ

The maximum link utilization constitutes by far the most popularTE objective function. However, its minimization presents a cleardrawback: setting the focus too strictly on the most utilized link of-ten leads to a worse distributions of traffic, adversely affecting theoverall performance in the network. In this sense we will considerthe mean link utilization umean as another possible objectivefunction:

umeanðX;RÞ ¼1q

Xl2L

ul ð3Þ

Minimizing umean may provide better network-wide performance,as long as the maximum link utilization remains bounded; we willfurther discuss this issue in Section 6.

The last performance indicator we shall consider in this work isthe queuing delay (i.e. the time that spans between the moment abyte enters the router and leaves it). This choice is justified by twoaspects. Firstly, its algebra is relatively simple, in the sense that thetotal delay of a path is the addition of the delay at each link. Sec-ondly, it is a very versatile indicator. A big queueing delay meansmore delay and jitter for streaming traffic. Moreover, a link withan important queueing delay is traversed by several bottleneckedflows, meaning that elastic traffic may obtain better throughputin other, less loaded, links. The mean queueing delay may be re-garded then as a numerical value of the congestion on the link.

Assume then that the queuing delay on link l is given by thefunction dl(ql). Given this function we can compute the queuingdelay of path p as dp ¼

Pl2pdlðqlÞ. As a measure of the network-

wide performance, we consider the expected end-to-end (e2e)queuing delay dmean defined as:

dmeanðX;RÞ ¼Xk2N

Xp2Pk

rkp � xk

� �dp ¼

Xl2L

ql � dlðqlÞ ð4Þ

That is to say, dmean is a weighted mean e2e queuing delay, wherethe weight for each path is how much traffic is sent along it

rkp � xk

� �, or in terms of links, the weight for each link is how much

traffic is traversing it (ql). We prefer a weighted mean queuing de-lay to a simple total delay because it reflects more precisely perfor-mance as perceived by traffic. Two situations where the total delayis the same, but in one of them most of the traffic is traversing heav-ily delayed links should not be considered as equivalent. Note that,by Little’s law, the value fl(ql) = ql�dl(ql) is proportional to the vol-ume of data in the queue of link l. We will then use this last valueas the addend in the last sum in (4), since it is easier to measurethan the queuing delay. Finally, note that, differently to umax, a largemean e2e queuing delay translates into bad performance for themajority of the traffic and not only for the traffic that traverses aparticularly loaded link.

Based on these definitions we will introduce the different opti-mization algorithms that strive to minimize some of these perfor-mance indicators, considering either the RR or DLB approach.

3. Stable Robust Routing

Finding a multi-path routing configuration minimizing umax isan instance of the classical multi-commodity flow problem whichcan be formulated as a linear program [22]. For a single knowntraffic matrix X, the problem can be easily solved by linear pro-gramming techniques [23]. However, as we have previously dis-cussed, traffic demands are uncertain and difficult to predict, andall we can expect is to find them within some bounded uncertaintyset.

In a robust perspective of the multi-path routing optimizationproblem, demand uncertainty is taken into account within therouting optimization, computing a single routing configurationfor all demands within some uncertainty set. In this work we con-sider a polyhedral uncertainty set X, more precisely a polytope as in[6], based on the intersection of several half-spaces that resultfrom linear constraints imposed to traffic demand.

As an example, let us define an uncertainty set X based on a gi-ven routing matrix Ro and the peak-hour links traffic load Ypeak ob-tained with this routing matrix:

X ¼ fX 2 Rm;Ro � X 6 Ypeak;X P 0g

Observe that this definition of the uncertainty set has a majoradvantage: routing optimization can be performed from easilyavailable links traffic load Y without even knowing the actual valueof the traffic demand X. Fig. 1 depicts the obtained uncertainty set,based on the convex intersection of q half-spaces of the formri

o � X 6 qpeaki ;8i 2 L, where ro

i stands for the ith row of the routingmatrix Ro.

The traditional Robust Routing Optimization Problem (RROP) de-fined in Table 1 consists of minimizing the maximum link utiliza-tion umax, considering all demands within X. The solution to theproblem is twofold: on the one hand, a routing configuration


Rrobust, and on the other hand, a worst-case performance thresholdurobust

max :

Rrobust ¼ argminR

maxX2X

umaxðX;RÞ

urobustmax ¼max

X2XumaxðX;RrobustÞ

Given a suitable definition of the uncertainty set, the obtained ro-bust routing configuration Rrobust is applied during long periods oftime; in this sense, we refer to Robust Routing as Stable Robust Rout-ing (SRR). The authors of [6] have shown that the RROP can be effi-ciently solved by linear programming techniques, applying acombined columns and constraints generation method. This meth-od iteratively solves the problem, progressively adding new con-straints and new columns to the problem.

The new constraints are the extreme points of the uncertaintyset X, and the new columns represent new paths added to reducethe objective function value. Only extreme points of X are added asnew constraints, as it is easy to see that every traffic demand X 2 X

can be expressed as a linear combination of these extreme de-mands. Regarding new added paths, the algorithm in [6] may notbe the best choice from a practical point of view since the numberof paths for each OD pair is not a priori restricted and the charac-teristics of added paths are not controlled. For example, it would beinteresting to have disjoint paths to route traffic from each singleOD pair, improving resilience. For this reason we modify the algo-rithm to select new paths, both limiting the maximum number ofpaths in Pk and taking as new candidates the shortest paths withrespect to link weights wi

l:

wil ¼

1

�þ 1� rki

l

� � ð5Þ

where rki

l corresponds to the fraction of traffic flow xk that traverseslink l after iteration i and � is a small constant that avoids numericalproblems. If OD pair k uses a single path p at iteration i, rki

l ¼ 1 forevery link l 2 p, and so this path is removed from the graph wherenew shortest paths are computed (wl ?1,"l 2 p). While this mayresult in a sub-optimal performance, it allows a real and practicalimplementation. In case there are no disjoint paths for OD pair k,we use the column constraint generation method used in [6] toadd new paths for OD pair k.

4. Dynamic Load Balancing

4.1. Routing games and wardrop equilibrium

As mentioned before, the objective in DLB is to minimize a cer-tain objective function g(X,R) in a distributed fashion (i.e. withoutrelying on any centralized entity). Algorithms that achieve this aretypically greedy, which present the desirable property of requiringminimum coordination among border routers. In this kind ofmechanisms, a path cost function /p is defined, and each OD pairgreedily minimizes the cost it obtains from each of its paths. Thiscontext constitutes an ideal case study for game theory, and isknown as Routing Game in its terminology [24,25].

Since each OD pair may arbitrarily balance traffic among itspaths, we will assume that OD pairs are constituted of infinitelymany agents. These agents control an infinitesimal amount of traf-fic, and decide along which path to send their traffic. In this contextrk

p represents then the fraction of agents of OD pair k that have p astheir choice. If each of these agents acts selfishly, then the systemwill be at equilibrium when no agent may decrease its cost by uni-laterally changing its path decision. This situation constitutes whatis known as a Wardrop Equilibrium (WE) [26], which is formally de-fined as follows:

Definition 1. The paths vector rkp

n ok2N;p2Pk

is a Wardrop Equilib-

rium if for each OD pair k 2 N and for each couple of paths p, q 2 Pk

with rkp > 0 it holds that /p 6 /q.

Intuitively speaking, a WE is a situation where each OD pairuses only those paths with minimum cost (for the given OD pair).Anyway, the path cost /p is in turn defined in terms of a certainnonnegative, non-decreasing and continuous link cost function/l(ql). There are roughly two kinds of games depending on the def-inition of /p. A Congestion Routing Game defines the path cost as/p ¼

Pl2p/lðqlÞ. On the other hand, a Bottleneck Routing Game de-

fines /p ¼maxl2p

/lðqlÞ.Much effort has been put into characterizing the resulting equi-

librium of these games. In this sense, a certain social cost functionis defined, which measures the dissatisfaction of the OD pairs as awhole (i.e. an optimum paths vector is one that minimizes thisfunction), and the objective is to quantify the difference betweenthe optimum and the resulting WE. In the case of a congestiongame, the typical social cost function is the same as in (4) (i.e.P

l2LqldlðqlÞ :¼P

l2LflðqlÞ), whereas for a bottleneck game the socialcost is usually the maximum /l(ql) over all links (i.e. max

l2L/lðqlÞ).

It may be proved that the WE of a congestion game coincideswith the unique minimum of the so-called potential functionUðRÞ ¼

Pl2L

R ql0 /lðxÞdx [24]. This means that if fl(ql) is continuous

differentiable, non-decreasing and convex, the WE of a congestiongame with /lðqlÞ ¼ f 0l ðqlÞ is socially optimum. In this sense, to min-imize dmean through DLB, we will play a Congestion Routing Gamewith a link cost equal to the derivative of the link mean queue size.In the sequel we shall note this game as MinDG (Minimum DelayGame).

On the other hand, characterization of the WE of a bottleneckgame is somewhat more complicated. In fact, it is relatively easyto see that in this case the WE is not even unique. Moreover,and rather unfortunately, it has been proved in [15] that evenif there always exist at least one WE that is socially optimum,nothing may be guaranteed about the rest (if any). However,the same paper proved that every WE that fulfills the so-calledefficiency condition is optimum, where this condition is definedas follows:

Definition 2. Let B(p) denote the number of network bottlenecks

over p; that is to say BðpÞ ¼ l 2 p : /lðqlÞ ¼ maxm2L;qm>0

f/mðqmÞg� ��

��.Then, a WE is said to satisfy the efficiency condition if all OD pairs

route their traffic along paths with a minimum number of network

bottlenecks; i.e. for all k 2 N and p, q 2 Pk with rkp > 0 it holds that

B(p) 6 B(q).

This result, which is relatively new, was not applied in the de-sign of neither TeXCP or REPLEX, both of which strive to minimizethe maximum link utilization by means of a greedy algorithm inthe path utilization (i.e. a bottleneck game with /p ¼max

l2pul). It

could then be the case that these algorithms converge to a sub-optimal WE. Possible consequences on the obtained performanceof ignoring this result will be further discussed later in the article.In any case, we shall note this game as MinUG (Minimum Utiliza-tion Game).

4.2. No-regret algorithms

We will now briefly discuss how, given the path cost function /p,the WE may be achieved for both routing games. In a recent article[14], the authors proved that if all OD pairs use no-regret algo-rithms, the global behavior will approach the WE. To be more pre-cise, for a given TM X, and for most time steps, the instantaneous

0.5 1 1.5x 104

100

102

104

ρ (kB/s)

Mea

n Q

ueue

Siz

e (k

B)

MeasurementsRegressionM/M/1


paths vector rkp

n ok2N;p2Pk

is very close to the WE, and this difference

vanishes with time.This result is very general, in the sense that it does not specify

any algorithm in particular. Its only requirement is the use of no-regret algorithms by all OD pairs (for an overview of some of themsee [27]). In particular, we will consider the Weighted MajorityAlgorithm (WMA) [28], which originated in the context of onlinelearning (more precisely from the online prediction using expert ad-vice problem), and whose pseudo-code for OD pair k is described inAlgorithm 1.

Algorithm 1. Weighted Majority Algorithm (WMA)
l
Fig. 2. Mean queue size: measurements and approximations.
1: for t = 1, . . . , 1 do 2: Obtain path costs /p "p 2 Pk
3:
for every path p 2 Pk do 4: if /p > min
q2Pk

/q do

5:
rkp b� rk
p

6:
end if 7: end for 8: Normalize the rk
p

9:
end for
At each iteration t, those paths whose cost is bigger than theminimum are punished by multiplying their respective rk

p by a cer-tain constant b < 1 (throughout our simulations we have usedb = 0.95, a value that we empirically verified obtains very good re-sults). Actually, and in order to avoid unnecessary changes in thetraffic distribution, we shall only update rk

p when the correspond-ing path cost is bigger than the minimum cost plus a certain mar-gin (in the case of MinUG we fixed the margin at 0.005, and forMinDG we used 5% of the minimum).

5. A preliminary comparison

In this section we shall present some first simulations that willhelp us to gain insight into the mechanisms and highlight some oftheir respective shortcomings. Before, we will discuss how we per-formed these and the rest of the simulations.

As the reference network we used Abilene, a high-speed Inter-net2 backbone network. Abilene consists of 12 router-level nodesconnected by 30 optical links (we only consider intra-domainlinks). The used router-level network topology and traffic demandsare available at [29]. Traffic data consists of 6 months of trafficmatrices collected every 5 min via Netflow from the Abilene Obser-vatory [30]. As measured traffic demands do not significantly loadthe network, we re-scaled them by multiplying all their entries bya constant. The dataset in [29] also provides the static routing con-figuration Ro deployed in Abilene during the 6-month long TMsmeasurement campaign.

In the case of MinDG, fl(ql) is typically chosen based on a sim-plistic model (e.g. the M/M/1 model which yields fl(ql) = ql/(cl � ql)) [9]. In order to avoid such arbitrary and unprecise choice,in [19] we proposed instead to learn this function (and its deriva-tive) from measurements. Fig. 2 depicts the real mean queue sizeof an operational network link at Tokyo obtained from [31],together with the M/M/1 estimation f M=M=1

l ðqlÞ and the non-para-metric regression f̂ lðqlÞ. It is clear that f M=M=1

l ðqlÞ consistentlyunderestimates the real queue size value, while f̂ lðqlÞ providesquite accurate results.

To be as fair as possible, all mechanisms use the same set ofpaths, namely those calculated by SRR as discussed in Section 3.The TMs are fed to the mechanisms in consecutive temporal order.

Both DLB mechanisms (MinUG and MinDG) are initiated at arbi-trary values of rk

p, which will be updated as new link load measure-ments arrive. We have assumed that each OD pair receives thesemeasurements every minute, meaning that for each new TM fiveupdates of their corresponding rk

p values will be performed (recallthat TMs are collected every 5 min). Results are shown then forevery minute. As a reference, we also computed the optimum val-ues uopt

max and doptmean for every TM X of the dataset.

In this example we consider a traffic scenario that presents anabrupt and large volume increase due to an external routing mod-ification. This corresponds to the TMs with indexes between 1050and 1200 from dataset X23 in [29]. The evaluation starts with anormal low traffic load situation, but after the 100th minute oneof the OD flows abruptly increases its traffic volume, loading thelinks it traverses until the end of the evaluation.

Regarding SRR, in what follows we shall use the term RROP as areference to SRR, recalling that the Robust Routing optimizationproblem is the one described in Table 1. Based on the static routingmatrix of Abilene Ro we define two different polytopes, the formeradapted to the Low Traffic Load period (LTL period, before the100th minute) and the latter adapted to the High Traffic Load per-iod (HTL period, after the 100th minute):

XLTL ¼ fX 2 Rm;Ro � X 6 YLTL;X P 0gXHTL ¼ fX 2 Rm;Ro � X 6 YHTL;X P 0g

We assume that traffic is known in advance in both definitions, andtake YLTL and YHTL as the maximum link load values observed duringthe LTL and HTL periods respectively. We compute two different Ro-bust Routing configurations for both polytopes; RROP-LTL corre-sponds to the SRR configuration for polytope XLTL, and RROP-HTLfor polytope XHTL. As we mentioned before, both RROP-LTL andRROP-HTL use the same set of paths, namely the paths obtainedfrom Table 1 for polytope XLTL. Solving RROP for a given set of pathsconsists of only adding new extreme points of polytope X (i.e., onlynew constraints are added).

Results are presented in Fig. 3, which depicts (a) the maximumlink utilization umax and (b) the mean end-to-end queuing delaydmean during the evaluation period. Let us first focus the attentionon the performance of RROP-HTL after the 100th minute. Despiteachieving an almost optimal performance as regards umax (a rela-tive difference with respect to the optimum smaller than 4%),RROP-HTL obtains a queuing delay that constantly exceeds theoptimum by almost 40% under a moderate network load. Such adifference may not be even acceptable from a QoS perspective,where end-to-end delays are even more important than networkcongestion. As we will show later, this loss in performance is a di-rect consequence of the local criterion used in RROP.

A second interesting observation comes from the difference be-tween RROP-HTL and RROP-LTL performances before and after the

200 400 6000

0.2

0.4

0.6

0.8

Time (min)

u max

200 400 60020

40

60

80

100

Time (min)

d mea

n (kB)

MinimumMinUGMinDGRROP−HTLRROP−LTL

MinUGMinDGRROP−HTLRROP−LTL

Minimum

Fig. 3. Maximum link utilization and mean end-to-end queuing delay. Traffic demand volume abruptly increases after the 100th minute.

Table 2Robust Routing Mean Utilization Optimization Problem (RRMP).

minimize umean

subject to :Xl2L

Xk2N

Xp2Pk

1cl

kpl :r

kp:xðkÞ 6 umean:q 8X 2 X

Xk2N

Xp2Pk

kpl :r

kp:xðkÞ 6 uthres

max :cl 8l 2 L; 8X 2 X

Xp2PðkÞ

rkp ¼ 1 8k 2 N

rkp P 0 8p 2 Pk; 8k 2 N


abrupt traffic volume increase; Fig. 3(a) shows that, despite an al-most negligible network load, RROP-LTL outperforms RROP-HTL byalmost 50% of relative utilization during the LTL period, while theopposite happens during the HTL period. The difference is not thatbig as regards delay before the 100th minute, but it becomes sig-nificant after the volume increase, where RROP-LTL obtains a verybad performance. These results are somehow expected given thepolytopes definition, and brings to light both the dependence ofRROP on the uncertainty set definition and the inherent conse-quence of using a single static routing configuration under largetraffic variations. Let us highlight the fact that in this examplewe have considered that traffic was known in advance for the def-inition of both polytopes XLTL and XHTL. While traffic during the LTLperiod is easy to predict, the definition of XHTL in a real traffic sce-nario is a challenging task. We will come back to this issue in thefollowing section.

Let us now discuss the results obtained by the dynamicschemes. A first important observation is that they present animportant overshoot, with an absolute difference with the opti-mum uopt

max of approximately 40%. Regarding MinDG in particular,convergence after the anomaly is very slow, taking more than600 min. However, it should be noted that when it eventually con-verges, it obtains a dmean that is very similar to the optimum. Interms of umax, the difference with respect to the optimum isapproximately 10%.

Special attention deserves the case of MinUG. After a shorterconvergence time (approximately 100 min), the resulting value ofumax is not the optimum. Let us recall that this kind of game (whichmodels schemes such as REPLEX of TeXCP) is used to converge to arouting configuration that minimizes the maximum link utilization[10,11]. However, in this case, the difference is more than 15%.Both of these problems will be further discussed in the followingsection.

6. Improving the algorithms performance

The simple evaluation conducted in the previous section showssome conception drawbacks of the SRR and DLB algorithms pre-sented in Sections 3 and 4 respectively. In this section we shall ex-plain the origin of these problems and present enhancedmechanisms to overcome them.

6.1. Improving stable Robust Routing

6.1.1. Network-wide performanceAs we showed in Fig. 3(b), the minimization of umax leads to a

distribution of traffic that results in an excessive end-to-end delay.Using the mean delay dmean as the objective function in RROP (cf.

Table 1) would be an interesting approach to ease the problem;however, fl(ql) is a nonlinear function and the optimization prob-lem becomes too difficult to solve. As we previously said, optimiza-tion under uncertainty is more complex than classical optimizationand simple optimization criteria should be used. In this sense, wecould use instead the mean link utilization umean as the objectivefunction.

The mean link utilization considers at the same time the load ofevery link in the network and not only the utilization of the mostloaded link; as we will show in the results, such an objective func-tion provides a better global performance as regards end-to-enddelay. However, a direct minimization of umean does not assure abounded maximum link utilization, which is not practical froman operational point of view. In this sense, we propose to changethe objective function in RROP by umean, while bounding the max-imum link utilization by a certain threshold uthres

max defined a priori.The resulting problem, which we shall call the Robust Routing MeanUtilization Optimization Problem (RRMP), is defined in Table 2.

RRMP is solved in the same way as RROP, using the same recur-sive algorithm proposed in [6]. Note that the difference betweenthe two problems is only a new constraint per each new traffic de-mand in X (in fact, for each extreme point of X). The drawback ofRRMP is its dependence on the value of uthres

max , which directly influ-ences the routing performance as we will shortly see. An interest-ing choice for uthres

max would be to use the output of RROP, namelyurobust

max . To some extent this would result in a similar routing solu-tion but with better traffic balancing.

An alternative approach is to minimize both the value of umax

and umean at the same time, which constitutes a problem of mul-ti-objective optimization (MOO). MOO problems are generally


more difficult to solve because traditional single-objective optimi-zation techniques cannot be directly applied. Nevertheless, theproblem of finding all the Pareto-efficient solutions to a linearMOO problem is well known and different approaches can be usedto treat the problem [32,33]. In this work we consider an intuitiveand easy approach to solve a MOO problem with standard single-objective optimization techniques. The approach consists in defin-ing a single aggregated objective function (AOF) that combinesboth objective functions. We define a weighted linear combinationof umax and umean as the new objective function uaof = a�umax +(1 � a)�umean, where 0 6 a 6 1 is the combination fraction. Despiteits simple form, this new objective is very effective and providesaccurate results for both performance indicators. We shall call thisnew optimization problem as Robust Routing AOF OptimizationProblem (RRAP), defined in Table 3. As before, RRAP is solved withthe same algorithms used in RROP.

6.1.2. Comparison between RRMP and RRAPWe will now evaluate both the RRMP and RRAP versions of SRR

in the traffic scenario previously considered in Section 5. In orderto appreciate the dependence of RRMP on the maximum link utili-zation threshold uthres

max , two different thresholds are used in theevaluation: uthres

max1¼ 1 (which corresponds to the constraint

umax 6 1 in Table 1), and uthresmax2¼ urobust

max , where urobustmax is the output

of RROP-HTL in Section 5. In the case of RRAP, the weight a is setto 0.5, namely an even balance between umax and umean. Thismay impress as a somewhat naive approach to the reader, butpractice shows that this choice provides in fact very good results.

Table 3Robust Routing AOF Optimization Problem (RRAP).

minimize uaof ¼ a:umax þ ð1� aÞ:umean

subject to :Xl2L

Xk2N

Xp2Pk

1cl

kpl :r

kp:xðkÞ 6 umean:q 8X 2 X

Xk2N

Xp2Pk

kpl :r

kp:xðkÞ 6 umax:cl 8l 2 L; 8X 2 X

Xp2PðkÞ

rkp ¼ 1 8k 2 N

rkp P 0 8p 2 Pk; 8k 2 N

100 250 400 550 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Time (min)

u max

MinimumRRMP umax 1

thres

RRMP umax 2thres

RRAP

Fig. 4. Maximum link utilization and mean end

Fig. 4 depicts the results in this case. Let us focus our attentionon the operation after the 100th minute, as all Robust Routing con-figurations use XHTL as the uncertainty set. To be as fair as possible,both RRMP and RRAP use the same set of paths as those used byRROP in Fig. 3. The figure clearly shows that the performance ofRRMP strongly depends on the threshold uthres

max . In the case ofuthres

max1, the attained maximum link utilization is well beyond the

optimal values, reaching almost a 70% of relative performance deg-radation. This overload directly translates into huge mean end-to-end queuing delays. Results are quite impressive when consideringthe second threshold, both as regards umax and dmean. RRMP usinguthres

max2provides a highly efficient Robust Routing configuration,

showing that it is possible to improve current implementationsof SRR with a slight modification of the objective function. How-ever, this dependence on the threshold uthres

max introduces a new tun-able parameter, something undesirable when looking for solutionsthat simplify network management.

As regards RRAP, obtained results are slightly worse than thoseobtained by RRMP uthres

max2, but still very close to the optimal perfor-

mance, with a relative performance degradation of about 10% asregards umax and dmean with respect to an optimal routing configu-ration. Nevertheless, RRAP has no tunable parameter apart fromthe combination factor a, which in fact is set to a half indepen-dently of the traffic situation.

6.1.3. The reactive Robust RoutingAs we showed in Section 5, the definition of the uncertainty set

has a major impact on the performance of SRR. In particular, wesaw that using a single definition of uncertainty set under highlyvariable traffic cannot provide routing efficiency for both normaloperation traffic and unexpected traffic events. Despite being oneof its most important features, using a single SRR configuration isnot the best strategy.

In [13] we proposed an adaptive version of SRR, known as theReactive Robust Routing (RRR). The basic idea in RRR consists incomputing a primal Robust Routing configuration Ro

robust for ex-pected traffic variations in normal operation within a primal poly-tope Xo. This polytope is defined as in Section 3, based on a certainfixed routing configuration Ro and the expected links traffic loadwe shall call Yo ¼ fqoi

g. Additionally, a set of m anomaly polytopesXj are defined, and a preemptive Robust Routing configurationRj

robust is computed for each of these anomaly polytopes.Let us explain the concept of an anomaly polytope. In Fig. 3, the

abrupt increase in traffic volume is caused by a single anomalousOD flow xk that unexpectedly carries a many times bigger trafficload h due to an external routing modification. After this exoge-nous unexpected event, the traffic demand X takes the valueX0 = X + h.dk, where dk = (d1,k, . . . ,dk,k, . . . ,dm, k)T, di,k = 0 if i – k and

100 250 400 550 700

40

60

80

100

Time (min)

d mea

n (kB)

MinimumRRMP umax 1

thresh

RRMP umax 2thresh

RRAP

-to-end queuing delay for RRMP and RRAP.


dk,k = 1. We shall designate this unexpected traffic increase in ODflow xk as anomalous traffic event Ak. The anomaly polytope Xk re-sults from expanding the primal polytope Xo in the directions ofthe links that traverses the anomalous OD flow xk, with respectto Ro. The reader should bear in mind that the kind of unexpectedtraffic events we deal with are independent of the intradomainrouting; these events originate outside the network and propagatebetween origin–destination nodes. This justifies the relevance ofthe polytope expansion with respect to Ro. The obtained polytopeXk is the smallest polytope that contains the unexpected traffic de-mand X0 and thus, the corresponding Robust Routing configurationRk

robust provides a relatively good performance under its occurrence.Fig. 5 explains the idea of the multiple anomaly polytope expan-sion. As before, ri

o stands for the ith row of the routing matrix Ro.Note that in a real scenario it is not possible to predict the size

of the anomalous traffic h. As a consequence, the primal polytopeXo is expanded to the limits of link capacities, obtaining the fol-lowing anomaly polytope for each anomalous traffic event Ak:

Xk ¼ fX 2 Rm;Ro � X 6 YAk ;X P 0g; 8k 2 N ð6Þ

In (6), the ith component of YAk takes the value qoiif ri;k

o ¼ 0, or thevalue ci if ri;k

o > 0, being ri;ko the element (i,k) of Ro.

Given the primal and the m preemptive Robust Routing config-urations Ro

robust and Rjrobust, RRR uses an on-line anomaly detection/

localization sequential algorithm to detect the occurrence of ananomalous event Ak, switching routing from Ro

robust to Rkrobust (and

from Rkrobust back to Ro

robust when normal operation is regained).We refer the reader to [13] for additional details on the detec-tion/localization algorithms and the implementation of RRR.

RRR can handle large and unexpected traffic variations in singleOD flows quite effectively (the case of multiple simultaneousanomalies is beyond the scope of RRR). However, given the diffi-culty involved in modifying the routing configuration of a largescale network in an on-line fashion, the contributions of RRR aremainly theoretical. This problem can be solved by using a Load Bal-ancing technique instead of a complete routing reconfiguration. InLoad Balancing, we keep the same set of paths Pk for each OD pair k,and only modify the fractions of traffic sent along each path. LoadBalancing can be easily performed on-line and does not require anyadditional modifications in current path-based networks such asMPLS. We shall refer to the load balancing variant of RRR as Reac-tive Robust Load Balancing (RRLB), stressing the difference betweenrouting reconfiguration and load balancing.

RRLB uses the same set of anomaly polytopes Xj defined in RRR,but the computation of the m preemptive Robust Routing configu-rations Rj

robust is slightly modified. The same set of paths Pk obtainedduring the computation of Ro

robust is used in every Rjrobust. As in Sec-

tions 5 and 6.1.2, routing configurations Rjrobust are obtained with a

simplified version of the former optimization algorithm, whereonly new traffic demands are progressively added and no extrapaths are created.

6.2. Improving Dynamic Load Balancing

6.2.1. Convergence timeThe DLB algorithms evaluated in Section 5 present an impor-

tant overshoot and a significant settling time in the presence ofsudden and large traffic variations. If the traffic anomaly is a per-fect step, then the overshoot is unavoidable. We will try to ad-dress the long settling time instead. The reason behind thisproblem is relatively simple, as shown in Fig. 6. The graph depictsthe evolution over time of the corresponding rk

p values of theanomalous OD pair for MinDG in the example of Section 5. Wemay see that, although the rk

p change exponentially fast, at themoment of the anomaly the values that should increase are so

small that it takes them a very long time to converge. A possiblesolution is to impose a minimum value to all rk

p. However, thiswill affect the precision of the algorithm and will still result insignificant settling times.

Actually, rkp may be regarded as an indicator of the performance

of path p in the previous iterations. A very small rkp means that p

performed very badly with respect to the rest of the paths in thepast. However, when the anomaly occurs, conditions severelychange and history is no longer as relevant. If we consider thatwe are in such situation, we could for instance completely ignorehistory and restart the game by setting rk

p ¼ 1=jPkj8k 2 Pk. Beforedeciding how to reassign rk

p, we will discuss how an OD pair maydecide if it should restart its game or not.

Consider a situation where most of the traffic for OD pair k isrouted along a path that is not the cheapest, and that the rk

p corre-sponding to the minimum-cost path is very small. This could meanthat although the former performed better in the past, this is nolonger true and some traffic should be re-routed to the latter. Thisis more so as the difference in cost increases. However, this ‘‘suspi-cious” situation could be due to noisy measurements. To make surethat the game has actually changed and that it should be restarted,we will require such a situation to persists during a certain numberof consecutive iterations. Once we detected that the game shouldbe restarted, we will re-route some of the traffic that was beingrouted along the path with the biggest rk

p to the cheapest one.The amount will be proportional to the relative difference in costto avoid overreacting. Finally, remember that with WMA fast adap-tation is achieved when the rk

p are not too small. The objective withthis ‘‘game restart” is simply to move rk

p from critically small val-ues. The algorithm will then rapidly converge to the optimum.We now present the pseudo-code of the complete algorithm forOD pair k:

Algorithm 2. WMA with Restart (WMA-R)

1:
for t = 1, . . . , 1 do 2: Obtain path costs /p "p 2 Pk
3:
Determine pmin ¼ argminp2Pk
/p and pmax ¼ argmaxp2Pk

rkp� �

4:
if rkpmin
< 0:1 and ð/pminþ /th < /pmax

Þ then

5:
nke nk
e þ 1
6: else 7: nk
e 0
8: end if 9: if nk
e 6 nkth then

10:
Perform a normal iteration of WMA (cf. Algorithm 1) 11: else 12: nk
e 0 � �
13:
Dr min /pmax/pmin

� 1;1 �rk

pmax�rk

pmin2

14:
rkpmax rk
pmax� Dr

15:
rkpmin rk
pminþ Dr

16:
end if 17: end for
The new variable nke counts the number of consecutive occur-

rences of a ‘‘suspicious” situation (we used nkth ¼ 3). The threshold

/th is to make sure that the difference in cost between paths is sig-nificant. In particular, for MinUG we used /th = 0.005 and forMinDG /th ¼ 0:2/pmin

. Finally, note that when the game is restarted,we re-route a certain amount of traffic from pmax to pmin, but atmost the amount of traffic routed along each path is equalized.

0 200 400 600

10−10

10−5

100

Time (min)

r pk

Fig. 6. Evolution of rkp for the anomalous OD pair (MinDG).

Fig. 5. Different anomaly polytopes for preemptive Robust Routing computation.


6.2.2. Converging to the social optimum in bottleneck gamesIn Section 5 we showed an example in which MinUG does not

converge to the optimum, and obtains a difference of 15% with re-spect to the optimum MLU. The reason behind this poor perfor-mance is simply that MinUG does not take into account theresult regarding the optimality of the WE and the efficiency condi-tion discussed in Section 4.1 and originally presented in [15]. Thisresult states that if at a WE all OD pairs send their traffic alongpaths with a minimum number of network bottleneck links (thosewith the maximum utilization in the whole network), the WE isoptimal. The problem we analyze now is how to design a path costfunction /p that takes into account this condition, so that whenusing it, the Load Balancing algorithm converges, when possible,to the correct WE. Note that the condition is only sufficient, mean-ing that a WE that fulfills the efficiency condition may not exist. Asimple example of such case is a single OD pair with two pathswith different lengths, where all links have the same capacity. Any-how, the two main difficulties in the design of such path cost arethe following. Firstly, the number of bottleneck links in a path isan integer (thus not continuous on rk

p). Secondly, the probabilityof two links having exactly the same utilization is zero, and as suchwe should consider the number of links that have an utilizationsimilar to the network bottleneck.

The objective is then to find a cost function that penalizes pathsin which several links have similar utilizations (and that this utili-zation is the maximum in all the network), and that it does notswitch between values to avoid oscillations. A candidate /p thatfulfills these two conditions is the so-called log-sum-exp function.Consider a set of arbitrary numbers A = {ai}, the log-sum-exp func-tion g(A) is defined as follows:

gðAÞ ¼ 1cA

logX

i¼1;...;jAjecAai

!

¼ ai� þ1cA

log 1þX

i¼1;...;jAj^i–i�ecAðai�ai� Þ

0@

1A ð7Þ

Consider the special case in which ai� ¼max A. It should be clearthat if ai� is significantly bigger than the rest of the elements in A,

the above convex and non-decreasing function constitutes anexcellent approximation of ai� . In fact, it easy to prove thatai� 6 gðAÞ 6 ai� þ logðjAjÞ=cA, meaning that we may control the pre-cision of the approximation through the parameter cA (the biggerthis parameter, the more precise the resulting approximation).Moreover, as more elements in A are similar to the maximum,g(A) approaches the upper bound, reaching it when all elementsare the same.

We will then use the second term of (7) as a penalty to thosepaths with several links whose utilization is similar to umax (themaximum utilization in the network). More precisely, given a pathp, let Up = {ul}l2p be the utilizations in the path, and l* 2 p be the linkwith the biggest utilization in p. We will then use the penalty func-tion with the alternative set U�p, which has the same elements asUp, but substitutes ul� by umax. This results in the following costfunction:

/p ¼ ul� þ1cp

log 1þX

l2p^l–l�ecpðul�umaxÞ

0@

1A ð8Þ

Even if this new cost function penalizes paths with several networkbottleneck links, it also penalizes longer paths, which was not ouroriginal objective. A good choice of cp will alleviate this side-effect.For instance, we used cp ¼ logðjpjÞ=maxf0:01; ul�=10g. This way, wetry to minimize the effect of log (jpj) and relativize the penalizationto ul� . The following section presents the results obtained by thisnew path cost function, which we will still call MinUG.

7. Evaluation and discussion

In this section we evaluate the performance of the different RRand DLB algorithms presented in this work, considering both nor-mal operation and anomalous traffic situations. We present anddiscuss three simulation case-scenarios: starting from a normaltraffic variation scenario, we increase the number of OD pairs thatpresent anomalous traffic variations. This allows for performancecomparison at different levels of traffic variability. As both RRAPand RRMP provide similar results (when uthres

max is correctly definedfor RRMP), we will only consider the RRAP mechanism in the eval-uation. Finally, we shall use RRLB-OP and RRLB-AP to designate theReactive Routing Load Balancing variants of RROP and RRAPrespectively.

7.1. Normal operation

The first case-scenario corresponds to traffic in normal opera-tion. The only variability is due to typical daily fluctuations.Fig. 7 presents the evolution of umax and dmean for RROP and RRAP,using a set of 260 TMs from dataset X01 in [29] (specifically, thosewith indexes between 420 and 680). All algorithms perform simi-larly as regards maximum link utilization, depicted in Fig. 7(a).This may be further appreciated in the boxplot summary presentedin Fig. 8(a), where values are relative to those obtained with anoptimal routing configuration. Note that the relative performancedegradation is around 10% in most cases.

Fig. 7(b) and Fig. 8(b) show that results are quite different as re-gards mean queuing delay. We may verify that the best results areobtained by MinDG, followed closely by RRAP. However, bothRROP and MinUG systematically obtain a significant differencewith respect to the optimum, generally between 30% and 40%.These results further highlight the limitations of RROP and MinUGas previously discussed: using umax as a performance objective re-sults in a relatively low maximum utilization, but neglects the restof the links, impacting the network-wide performance.

200 400 600 800 1000 1200

0.3

0.35

0.4

0.45

0.5

Time (min)

u max

MinimumMinUGMinDGRROPRRAP

200 400 600 800 1000 1200

40

50

60

70

80

90

Time (min)

d mea

n (kB)


Fig. 7. Maximum link utilization and mean end-to-end queuing delay under normal operation.

RROP RRAP MinUG MinDG

1

1.05

1.1

1.15

1.2

1.25

1.3

u max

(rel

ativ

e)

RROP RRAP MinUG MinDG1

1.1

1.2

1.3

1.4

1.5

1.6

d mea

n (rel

ativ

e)

Fig. 8. Maximum link utilization and mean end-to-end queuing delay under normal operation, boxplot performance summary. Depicted results are relative to theoptimal values.

200 400 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Time (min)

u max

MinimumMinUGMinDGRRLB−OPRRLB−AP

200 400 60030

40

50

60

70

80

90

100

Time (min)

d mea

n (kB)

MinimumMinUGMinDGRRLB−OPRRLB−AP

Fig. 9. Maximum link utilization and mean end-to-end queuing delay under one anomalous OD pair.


7.2. One anomalous OD pair

The second case-scenario is the one considered in Section 5,where there is a sudden and abrupt increase of the traffic volumecarried by one OD flow. As a difference with respect to the evalu-ation in Fig. 3, where traffic was assumed known in advance, thiscase-scenario corresponds to a real situation where traffic anoma-lies can not be forecast.

Firstly, notice in Fig. 9 how the improvements discussed inSection 6.2.1 for MinUG and MinDG result in a relatively smaller

overshoot than before, but most importantly the settling timehas been significantly decreased (in the case of MinDG, from600 min to less than 50 min). Moreover, note how the modifiedcost function proposed in Section 6.2.2 results in MinUG converg-ing to the socially optimum WE.

Regarding umax, both RRLB-OP and RRLB-AP obtain similar re-sults, with a relative performance degradation generally smallerthan 15% (this may be easily appreciated in the boxplot summaryin Fig. 10). Note that while relatively important, this performancedegradation is surprisingly small if we consider that traffic

RRLB−OP RRLB−AP MinUG MinDG1

1.05

1.1

1.15

1.2

1.25

1.3

u max

(rel

ativ

e)

RRLB−OP RRLB−AP MinUG MinDG1

1.1

1.2

1.3

1.4

1.5

d mea

n (rel

ativ

e)

Fig. 10. Maximum link utilization and mean end-to-end queuing delay under one anomalous OD pair, boxplot performance summary. Depicted results are relative to theoptimal values.


increases more than 500% in less than 10 min. The same may besaid about MinDG, which obtains a degradation between 20% and25%. In terms of dmean, MinUG and RRLB-AP perform similarly. Theyboth clearly outperform RRLB-OP, achieving a relative mean queu-ing delay almost 30% smaller. These results reinforces once againour observations about the difficulty in RROP to attain global per-formance, and the advantages of using a simple network-wideobjective function in a Robust Routing algorithm. Moreover, theyalso illustrate the difference between MinUG and RROP. Even whenMinUG was designed with the same objective than RROP (namelyto minimize umax), the fact that in MinUG each OD pair greedilyminimizes the path utilization results in a different overallbehavior.

7.3. Two anomalous OD pairs

In this case-scenario two OD pairs largely increase their trafficdemand, one at approximately the 150th minute and the other atthe 320th (they correspond to TMs with indexes between 160and 280 from dataset X06 in [29]). They both present this anoma-lous traffic until the end of the simulation. We shall then separatethe simulation in three parts: the first third where traffic is normal,the second third were only one OD pair is anomalous, and the lastthird were both OD pairs are anomalous. The anomaly localizationalgorithm of RRR was designed for the case of one single anoma-lous OD pair. Because of this, we will further illustrate the tradeoffbetween size of the considered uncertainty set and efficiency of theobtained routing, and chose the uncertainty polytope by the trafficloads seen after the second anomaly.

100 200 300 400 500 6000.1

0.2

0.3

0.4

0.5

0.6

Time (min)

u max


Fig. 11. Maximum link utilization and mean end-to-e

In Fig. 11(a) we may see that, as expected, the umax obtained byboth RROP and RRAP in the last third of the simulation are veryclose to the optimum. However, in the rest of the simulation thedifference may be important, specially in the second part wherethe absolute difference for RRAP is almost 20%. It is important tohighlight the results obtained by MinDG and MinUG. Notice thatthe overshoot this time is much smaller than before (a maximumof 0.1 in umax for MinDG) and the settling time is negligible. Thesegood results may be further appreciated in the boxplot summary ofFig. 12. In this case-scenario the increase in traffic of the anoma-lous OD pairs is more gradual than before, which clearly favors dy-namic schemes in their performance.

8. Conclusions and future work

From the study we presented in this article we may reachseveral conclusions. The most important is probably that we haveshown that using a single routing configuration is not a cost-effective solution when traffic is relatively dynamic. Stable RobustRouting obtains a rather poor performance either when faced withnon considered traffic demands (tight uncertainty sets) or whendesigned to manage as many traffic demands as possible (biguncertainty sets). It is clear from our study that some form of dyna-mism is necessary, which could be either RRLB (Reactive RobustLoad Balancing) or DLB (Dynamic Load Balancing).

RRLB computes a nominal operation routing configuration, andhas an alternative routing (using the same paths than in normaloperation) for certain possible anomalous situations. In order todetect these anomalous situations, link load measurements have

100 200 300 400 500 60030

40

50

60

70

80

90

100

110

Time (min)

d mea

n (kB)


nd queuing delay under two anomalous OD pair.


1.1

1.2

1.3

1.4

1.5

1.6

u max

(rel

ativ

e)


1.1

1.2

1.3

1.4

d mea

n (rel

ativ

e)

Fig. 12. Maximum link utilization and mean end-to-end queuing delay under two anomalous OD pair, boxplot performance summary. Depicted results are relative to theoptimal values.


to be gathered [13]. On the other hand, DLB gathers these samemeasurements but also requires updating Load Balancing in a rel-atively small time-scale. The added complexity is then to distributethese measurements to all ingress routers (instead of a central en-tity) and updating the Load Balancing in real-time.

Our results show that the additional complexity involved in DLBis not justified when the variability (or the anomalies) are not verysignificant. However, the use of DLB under highly dynamic traffic isvery appealing and generally provides better results than RRLB.Moreover, if the anomalies may not be correctly detected or local-ized (as in Section 7.3), the only effective solution is DLB.

Regarding RR in particular, we saw that using a local perfor-mance criterion such as the maximum link utilization (MLU) isnot a suitable objective function as regards network-wide perfor-mance and QoS provisioning. In particular, we showed that an al-most optimal robust routing configuration with respect to MLUcan experience rather high mean end-to-end queuing delays, avery important performance indicator for all types of traffic. Themaximum link utilization is widely used in current network opti-mization problems, particularly in most Robust Routing proposals,thus we believe that this simple evidence can help and should beconsidered in enhanced future implementations.

In fact, we have shown that objective optimization functionscan be kept simple, and yet better network-wide performancecan be attained. By using a simple combination of performanceindicators such as the maximum and the mean link utilization,we obtained a Robust Routing configuration that definitely outper-forms current implementations from a global end-to-end perspec-tive, while achieving very similar results as regards worst-case linkutilization.

The framework of Aggregated Objective Functions (AOF) weused provides interesting results as regards multi-objective opti-mization, particularly in the context of robust optimization. AnAOF approach can be used to construct better objective functionsfrom simple performance indicators, avoiding the need of morecomplex Multi-Objective Optimization (MOO) techniques. As partof our ongoing work we are currently analyzing the trade-off be-tween using a simple AOF approach against a more complex butmore complete MOO approach, computing all Pareto-efficient solu-tions for a polyhedral uncertainty set and comparing theirperformance.

In what respects DLB, dynamic approaches are generally metwith reluctance due to their transient behavior under strong trafficvariations. However, we have shown that this transient behaviorcan be effectively controlled, or at least alleviated, by simplemechanisms. Concerning the two different games we presented,conclusions are similar to those of RR. Striving to minimize dmean

instead of umax results in a somewhat bigger maximum utilization,but a (sometimes much) better global performance.

It should also be highlighted that this article represents one ofthe first studies using no-regret algorithms for Load Balancing,and as such much exploration is left to be done. In this sense, letus remark that WMA is arguably the simplest no-regret algorithmin the literature. For instance, it is easy to see that not all b < 1guarantee a non-oscillatory behavior of the algorithm (considerfor example b = 0). There exist other more sophisticated algorithmsof this kind, that do not require any parametrization at all (includ-ing b) and still guarantee convergence (see for instance [34]),whose exploration is left for future work. However, an importantconclusion of the current article is that, whatever the no-regret algorithm we choose, we will still require a ‘‘restart” featureas in WMA-R.

Our study also highlighted a problem with previously proposedDLB algorithms, namely the wrong assumption that OD pairs thatgreedily minimize the path utilization converge to a routing con-figuration that minimizes MLU. Based on a recent result [15], wehave explored the possibility of modifying the path cost functionso that the resulting routing configuration is actually optimum.Preliminary results presented in this article are very promisingand encourage us to further study this new cost function (e.g. char-acterize the resulting WE or consider other alternative penalizationfunctions).

Acknowledgements

The authors thank the anonymous reviewers for their valuablecomments. This work was partially funded by CELTIC projectTRANS and FP7 project Euro-NF.

References

[1] Cisco Systems, Global IP Traffic Forecast and Methodology 2006–2011, WhitePaper (2007 - updated 2008).

[2] Cisco Systems, The Exabyte Era, White Paper (2007 - updated 2008).[3] Global Action Plan, An Inefficient Truth. URL <http://www.globalactionplan.

org.uk/upload/resource-/Full-report.pdf>.[4] M. Gupta, S. Singh, Greening of the Internet, in: Proceedings of the 2003

Conference on Applications, Technologies, Architectures, and Protocols forComputer Communications (SIGCOMM ’03), Karlsruhe, Germany, 2003, pp.19–26.

[5] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, F. True, Derivingtraffic demands for operational IP networks, in: IEEE/ACM Trans. Networking,vol. 9 (3), 2001, pp. 265–279.

[6] W. Ben-Ameur, H. Kerivin, Routing of uncertain traffic demands, Opt. Eng. 6 (3)(2005) 283–313.

[7] D. Applegate, E. Cohen, Making intra-domain routing robust to changing anduncertain traffic demands: understanding fundamental tradeoffs, in:Proceedings of the 2003 Conference on Applications, Technologies,

http://www.globalactionplan.org.uk/upload/resource-/Full-report.pdf

http://www.globalactionplan.org.uk/upload/resource-/Full-report.pdf


Architectures, and Protocols for Computer Communications (SIGCOMM ’03),Karlsruhe, Germany, 2003, pp. 313–324.

[8] H. Wang, H. Xie, L. Qiu, Y.R. Yang, Y. Zhang, A. Greenberg, COPE: trafficengineering in dynamic networks, SIGCOMM Comput. Commun. Rev. 36 (4)(2006) 99–110.

[9] A. Elwalid, C. Jin, S. Low, I. Widjaja, MATE: MPLS adaptive traffic engineering,in: IEEE 20th Annual Joint Conference of the IEEE Computer andCommunications Societies. (INFOCOM 2001), vol. 3, Anchorage, USA, 2001,pp. 1300–1309.

[10] S. Kandula, D. Katabi, B. Davie, A. Charny, Walking the tightrope: responsiveyet stable traffic engineering, in: Proceedings of the 2005 Conference onApplications, Technologies, Architectures, and Protocols for ComputerCommunications (SIGCOMM ’05), Philadelphia, USA, 2005, pp. 253–264.

[11] S. Fischer, N. Kammenhuber, A. Feldmann, REPLEX: dynamic trafficengineering based on wardrop routing policies, in: Proceedings of the 2006ACM CoNEXT Conference (CoNEXT ’06), Lisboa, Portugal, 2006, pp. 1–12.

[12] A. Khanna, J. Zinky, The revised ARPANET routing metric, SIGCOMM Comput.Commun. Rev. 19 (4) (1989) 45–56.

[13] P. Casas, L. Fillatre, S. Vaton, Robust and reactive traffic engineering fordynamic traffic demands, in: Next Generation Internet Networks (NGI 2008),Krakow, Poland, 2008, pp. 69–76.

[14] A. Blum, E. Even-Dar, K. Ligett, Routing without regret: on convergence to nashequilibria of regret-minimizing algorithms in routing games, in: 25th AnnualACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing(PODC 2006), Denver, USA, 2006, pp. 45–52.

[15] R. Banner, A. Orda, Bottleneck routing games in communication networks, IEEEJ. Select. Areas Commun. 25 (6) (2007) 1173–1179.

[16] M. Roughan, M. Thorup, Y. Zhang, Traffic engineering with estimated trafficmatrices, in: Proceedings of the Third ACM SIGCOMM Conference on InternetMeasurement (IMC ’03), Miami Beach, USA, 2003, pp. 248–258.

[17] C. Zhang, Y. Liu, W. Gong, J. Kurose, R. Moll, D. Towsley, On optimal routingwith multiple traffic matrices, in: 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM 2005), vol. 1, MiamiBeach, USA, 2005, pp. 607–618.

[18] P. Casas, S. Vaton, An adaptive multi temporal approach for RobustRouting, in: Euro-FGI Workshop on IP QoS and Traffic Control, Lisbon,Portugal, 2007.

[19] F. Larroca, J.-L. Rougier, Minimum-delay Load Balancing through non-parametricregression, in: Proceedings of the Eigth International IFIP-TC 6 NetworkingConference (NETWORKING ’09), Aachen, Germany, 2009, pp. 782–794.

[20] M. Johansson, A. Gunnar, Data-driven traffic engineering: techniques,experiences and challenges, in: Third International Conference on BroadbandCommunications, Networks and Systems (BROADNETS 2006), San José, USA,2006, pp. 1–10.

[21] I. Juva, Robust Load Balancing, in: IEEE Global Telecommunications Conference(GLOBECOM ’07), Washington D.C., USA, 2007, pp. 2708–2713.

[22] R.K. Ahuja, T.L. Magnanti, J.B. Orlin, Network Flows: Theory, Algorithms, andApplications, Prentice Hall, 1993.

[23] D. Mitra, K. Ramakrishnan, A case study of multiservice, multipriority trafficengineering design for data networks, in: 1999 Global TelecommunicationsConference (GLOBECOM ’99), vol. 1B, 1999, pp. 1077–1083.

[24] E. Altman, T. Boulogne, R. El-Azouzi, T. Jiménez, L. Wynter, A survey onnetworking games in telecommunications, Comput. Oper. Res. 33 (2) (2006)286–311.

[25] F. Larroca, J.-L. Rougier, Routing Games for Traffic Engineering, in: IEEEInternational Conference on Communications (ICC ’09), Dresden, Germany,2009.

[26] J. Wardrop, Some theoretical aspects of road traffic research, Proceedings ofthe Institution of Civil Engineers, Part II 1 (36) (1952) 352–362.

[27] R. Yaroshinsky, R. El-Yaniv, S.S. Seiden, How to better use expert advice, Mach.Learn. 55 (3) (2004) 271–309.

[28] N. Littlestone, M.K. Warmuth, The weighted majority algorithm, Inf. Comput.108 (2) (1994) 212–261.

[29] Yin Zhang, Abilene Dataset. URL <http://www.cs.utexas.edu/yzhang/research/AbileneTM/>.

[30] The Abilene Observatory. URL <http://abilene.internet2.edu/observatory/>.[31] K. Cho, WIDE-TRANSIT 150 Megabit Ethernet Trace 2008-03-18. URL <http://

mawi.wide.ad.jp/mawi/samplepoint-F/20080318/>.[32] J. Evans, R. Steuer, A revised simplex method for linear multiple objective

programs, Math. Program. 5 (1) (1973) 54–72.[33] J. Ecker, I. Kouada, Finding all efficient extreme points for multiple objective

linear programming programs, Math. Program. 14 (1978) 249–261.[34] P. Auer, N. Cesa-Bianchi, C. Gentile, Adaptive and self-confident on-line

learning algorithms, J. Comput. Syst. Sci. 64 (2002) 48–75.

http://www.cs.utexas.edu/yzhang/research/AbileneTM/

http://www.cs.utexas.edu/yzhang/research/AbileneTM/

http://abilene.internet2.edu/observatory/

http://mawi.wide.ad.jp/mawi/samplepoint-F/20080318/

http://mawi.wide.ad.jp/mawi/samplepoint-F/20080318/

taming traffic dynamics: analysis and improvements

Documents