dynamic distributed algorithm for computing for computing multiple next hop on a tree- icnp 2013

10
Dynamic Distributed Algorithm for Computing Multiple Next-Hops on a Tree Haijun Geng ∗‡ , Xingang Shi †‡ , Xia Yin ∗‡ , Zhiliang Wang †‡ Department of Computer Science & Technology, Tsinghua University Institute for Network Sciences and Cyberspace, Tsinghua University Tsinghua National Laboratory for Information Science and Technology (TNList) Email:{genghj, yxia, wzl}@csnet1.cs.tsinghua.edu.cn, [email protected] Abstract—High reliability is always pursued by network designers. Multipath routing can provide multiple paths for transmission and failover, and is considered to be effective in the improvement of the network reliability. However, existing multipath routing algorithms focus on how to find as many paths as possible, rather than their computation or communication overhead. We propose a dynamic distributed multipath algorithm (DM- PA) to help a router in a link-state network find multiple next- hops for each destination. A router runs the algorithm locally and independently, where only one single shortest path tree (SPT) needs to be constructed, and no message other than the basic link states is disseminated. DMPA maintains the SPT and dynamically adjusts it in response to network state changes, so the sets of next- hops can be incrementally and efficiently updated. At the same time, DMPA guarantees loop-freeness of the induced forwarding path by a partial order of the routers underpinning it. We evaluate DMPA and compare it with some latest multipath algorithms, using a set of real, inferred and synthetic topologies. The results show that DMPA can provide good reliability and fast recovery for the network with very low overhead. I. I NTRODUCTION With the rapid development of the Internet, more and more services and applications are widely deployed, which pose stringent requirements on its effectiveness and reliabili- ty. Traditional routing algorithms are mostly concerned with finding a shortest path towards the destination, and cannot provide good connectivity under frequent network failures [1]. This highlights the need for mechanisms that possess fast and efficient recovery capabilities. Towards this goal, multipath routing [2]–[8] has been proposed to use multiple alternative paths for data transmission. It not only improves a network’s resilience to route failures, but also provides other benefits such as better load balance, higher throughput, and enhanced security. Since the performance of routing and forwarding is critical for the Internet, multipath algorithms must be highly efficient so as not to become a bottleneck. Existing approaches often focus on finding paths as many as possible, but do not take much effort in reducing their computation or communication overhead. For example, they either build multiple shortest path trees [2], [7], [8], or exchange excessive messages between neighbors [3], so the induced cost will be particularly high for high degree nodes. 1 The only exception we are aware of is the TBFH algorithm proposed by P. Mrindol et al. [9], which has the lowest time complexity among these multipath algorithms. On the other hand, consider a link state network where each router can learn the link states of the whole network. When a link changes its state or weight, all existing algorithms will recompute all the next-hops for each destination from scratch, causing router resource waste and route convergence delay. We believe it is unnecessary and can be significantly improved. We propose a tree based distributed multipath algorithm, DMPA, for a link state network. On each node, only a single shortest path tree (SPT) needs to be computed, locally and independently, without disseminating any information other than the typical link states. A rule to select the next-hops is designed such that when the tree is fully built, for each destination in the network, a set of next-hops are derived, and any forwarding path induced from the results of the distributed computation is loop free, guaranteed by a partial order of the routers underpinning the path. In addition, the next-hops can be incrementally updated in response to any link state change, instead of being computed from scratch. As far as we know, DMPA is the first dynamic distributed multipath algorithm that can produce multiple loop-free paths, and it is much more efficient than existing multipath algorithms, as will be demonstrated by our evaluations. The rest of the paper is organized as follows. Section II introduces the background and related works. Section III presents details of DMPA and proves its loop free property, while Section IV evaluates its performance and compares it with some latest algorithms. Finally Section V concludes the paper. II. BACKGROUND Nowadays, Internet failures have become routine events rather than exceptions [6]. Many solutions have been proposed to handle this problem in different aspects, from physical level approaches such as optical routing protection, to ap- plication level schemes such as remote server backup. Since network connectivity is a core service provided by the routing framework, it is natural to design routing algorithms that are more resilient to failures. Multipath routing computes multiple paths between source-destination pairs, and provides not only redundant backups, but also other features such as load balance and aggregated bandwidth. 1 In this paper, we use router and node interchangeably. 978-1-4799-1270-4/13/$31.00 c 2013 IEEE

Upload: ngocthanhdinh

Post on 15-Nov-2015

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Dynamic Distributed Algorithm for ComputingMultiple Next-Hops on a Tree

    Haijun Gengz, Xingang Shiyz, Xia Yinz, Zhiliang WangyzDepartment of Computer Science & Technology, Tsinghua UniversityyInstitute for Network Sciences and Cyberspace, Tsinghua University

    zTsinghua National Laboratory for Information Science and Technology (TNList)Email:fgenghj, yxia, [email protected], [email protected]

    AbstractHigh reliability is always pursued by networkdesigners. Multipath routing can provide multiple paths fortransmission and failover, and is considered to be effective inthe improvement of the network reliability. However, existingmultipath routing algorithms focus on how to find as many pathsas possible, rather than their computation or communicationoverhead.

    We propose a dynamic distributed multipath algorithm (DM-PA) to help a router in a link-state network find multiple next-hops for each destination. A router runs the algorithm locallyand independently, where only one single shortest path tree (SPT)needs to be constructed, and no message other than the basic linkstates is disseminated. DMPA maintains the SPT and dynamicallyadjusts it in response to network state changes, so the sets of next-hops can be incrementally and efficiently updated. At the sametime, DMPA guarantees loop-freeness of the induced forwardingpath by a partial order of the routers underpinning it.

    We evaluate DMPA and compare it with some latest multipathalgorithms, using a set of real, inferred and synthetic topologies.The results show that DMPA can provide good reliability andfast recovery for the network with very low overhead.

    I. INTRODUCTION

    With the rapid development of the Internet, more andmore services and applications are widely deployed, whichpose stringent requirements on its effectiveness and reliabili-ty. Traditional routing algorithms are mostly concerned withfinding a shortest path towards the destination, and cannotprovide good connectivity under frequent network failures [1].This highlights the need for mechanisms that possess fast andefficient recovery capabilities. Towards this goal, multipathrouting [2][8] has been proposed to use multiple alternativepaths for data transmission. It not only improves a networksresilience to route failures, but also provides other benefitssuch as better load balance, higher throughput, and enhancedsecurity.

    Since the performance of routing and forwarding is criticalfor the Internet, multipath algorithms must be highly efficientso as not to become a bottleneck. Existing approaches oftenfocus on finding paths as many as possible, but do not takemuch effort in reducing their computation or communicationoverhead. For example, they either build multiple shortest pathtrees [2], [7], [8], or exchange excessive messages betweenneighbors [3], so the induced cost will be particularly high for

    high degree nodes. 1 The only exception we are aware of is theTBFH algorithm proposed by P. Mrindol et al. [9], which hasthe lowest time complexity among these multipath algorithms.On the other hand, consider a link state network where eachrouter can learn the link states of the whole network. Whena link changes its state or weight, all existing algorithms willrecompute all the next-hops for each destination from scratch,causing router resource waste and route convergence delay. Webelieve it is unnecessary and can be significantly improved.

    We propose a tree based distributed multipath algorithm,DMPA, for a link state network. On each node, only a singleshortest path tree (SPT) needs to be computed, locally andindependently, without disseminating any information otherthan the typical link states. A rule to select the next-hopsis designed such that when the tree is fully built, for eachdestination in the network, a set of next-hops are derived, andany forwarding path induced from the results of the distributedcomputation is loop free, guaranteed by a partial order ofthe routers underpinning the path. In addition, the next-hopscan be incrementally updated in response to any link statechange, instead of being computed from scratch. As far aswe know, DMPA is the first dynamic distributed multipathalgorithm that can produce multiple loop-free paths, and it ismuch more efficient than existing multipath algorithms, as willbe demonstrated by our evaluations.

    The rest of the paper is organized as follows. SectionII introduces the background and related works. Section IIIpresents details of DMPA and proves its loop free property,while Section IV evaluates its performance and compares itwith some latest algorithms. Finally Section V concludes thepaper.

    II. BACKGROUND

    Nowadays, Internet failures have become routine eventsrather than exceptions [6]. Many solutions have been proposedto handle this problem in different aspects, from physicallevel approaches such as optical routing protection, to ap-plication level schemes such as remote server backup. Sincenetwork connectivity is a core service provided by the routingframework, it is natural to design routing algorithms that aremore resilient to failures. Multipath routing computes multiplepaths between source-destination pairs, and provides not onlyredundant backups, but also other features such as load balanceand aggregated bandwidth.

    1In this paper, we use router and node interchangeably.978-1-4799-1270-4/13/$31.00 c2013 IEEE

  • There have been many multipath routing solutions. Equal-cost multi-path (ECMP) [10] allows packets to be routedalong multiple paths of equal cost, which can be tuned bynetwork operators in purpose. However, in certain cases itis just impossible to achieve equal costs no matter whatlink weights are used [11], and ECMP does not offer goodreliability. Source selectable deflection [2] deflects packetsbased on the shortest path costs of a router and its neighbors toeach destination. It proposes three rules with increasing pathdiversity as well as computation complexity, and its overheadalso increases proportionally to the degree of a node.

    Multi-topology routing [7], [8] pre-computes routes basedon backup topologies tailored for specific failures, eitherby removing the corresponding edges or by increasing theirassociated costs. Thus it enables each router to save severalvalid paths to each destination. Path splicing [3] is an en-hancement to multi-topology routing. It creates a set of slicesfor the network based on random link-weight perturbations,and end system can control which slices the routers shoulduse by embedding control bits in packet headers. In [12],multiple instances of a link state routing protocol are used toprovide multiple choices, where each link is associated witha vector of weights tuned by end systems. The complexity ofthese algorithms is proportional to the number of alternativeconfigurations they want to employ.

    Discount Shortest Path Algorithm (DSPA) [13] computesK-shortest paths and takes into account both path quantity andpath independence. However, computing the K-shortest pathsis still much more computationally intensive than finding asingle shortest path.

    Since the performance of routing and forwarding is criticalto the Internet, efficient route computation methods, in partic-ular, dynamica algorithms for shortest path tree computationhave been extensively studied in the literature [14][17].These algorithms only need to incrementally update their datastructures when network state changes, thus are much fasterthan the static algorithms which do recomputation from scratcheach time. However, they only concern about one single pathfor each destination. Recently, TBFH [9] was proposed toaccelerate multipath computation based on the next-hop rulepresented in [18], but it is still a static algorithm, and we arenot aware of any dynamic algorithm for efficient multipathcomputation, which is the focus of this paper.

    III. ALGORITHM FOR COMPUTING MULTIPLENEXT-HOPS

    A. Notations and Basic Idea

    Before formally describing our tree based multipath algo-rithm, we first define some notations, which are also summa-rized in Table I. A network is modeled as an undirected graphG = (V;E), where V and E denote the set of nodes (routers)and the set of edges (links) in the network respectively. Eachnode v 2 V has a unique Router-ID R(v), while each edgebetween node u and v has a positive link cost L(u; v). 2

    Each node independently computes its next-hops for alldestinations, so in the rest of the paper, our algorithm willbe described with respect to a particular node c that performs

    2L(u; v) =1 if u and v are not neighbors.

    TABLE I: Notations

    G=(V;E) Undirected graph with nodes and edgesL(u; v) Direct link cost between node u and node vTc Shortest path tree rooted at node c

    Cc(v) Cost from node c to node v in TcHc(v) Children of node v in TcPc(v) Parent of node v in TcDc(v) Descendants of node v (itself included) in TcNc(v) Next-hop set computed by node c for destination node vBc(v) Best next-hop computed by node c for destination node v

    such kind of computation. This node builds a shortest path treeTc rooted at itself, containing all the nodes in the network aspotential destinations. We denote the cost of the path from cto v in Tc by Cc(v), the children of v in Tc by Hc(v), theparent of v in Tc by Pc(v), and the descendants of v in Tc byDc(v), with c itself included.

    The object of c is to compute a candidate set of next-hops Nc(v) for each destination v (v 6= c), so that when apacket destined to v arrives at c, c can select a next-hop fromNc(v) and forward this packet to . In particular, we use Bc(v)to represent the best/default candidate, which lies along theshortest path from c to v, and save it as the first entry in Nc(v)(i.e., Bc(v) = Nc(v)[0]). Since Tc is a shortest path tree, Cc(v)is the lowest cost from c to v in the network, leading to thefollowing lemma.

    Lemma 1. The Best Next-Hop Rule

    Bc(v) =

    v Pc(v) = c

    Bc(Pc(v)) Pc(v) 6= c (1)

    Equation (1) in Lemma 1 means the best next-hop Bc(v)for a destination v is cs direct child along the path from c to vin Tc. A shortest path routing algorithm, such as open shortestpath first (OSPF) [10], [19], computes a single next-hop Bc(v)by employing equation (1) at each step when a new node v isadded to the SPT. To compute a set Nc(v) of next-hops for v,we start with a simple rule called downstream criterion (DC)[20], rephrased as follows:

    Theorem 1. For packets destined to a destination v, node c(c 6= v) can forward them to its any neighboring node x aslong as Cx(v) < Cc(v), and there will be no forwarding loopin the induced forwarding path.

    This theorem holds true because Cx(v) and Cc(v) arerespectively the lowest costs from x to v and c to v in thenetwork, and the cost to the destination strictly decreases inany forwarding step. It is easy to verify that forwarding packetsto the best next-hop also satisfies the DC rule, as stated in thefollowing lemma.

    Lemma 2. CBc(v)(v) < Cc(v).

    The DC rule is the basis of many loop-free multipathrouting algorithms [2], [9], [18], which differ in their waysto find such neighboring nodes that satisfy this rule. 3 One ofour contribution lies in the following rule, which is slightlymore strict than the DC rule:

    3Their ways are also the root cause of their high complexity.

  • Definition 1. Given any two nodes u and v in the shortestpath tree Tc, if

    Cc(u) Cc(Bc(u)) + L(u; v) < Cc(v); (2)we say u can contribute (its best next hop) to v.

    Lemma 3. (The Next-Hop Contribution Rule)If u can contribute to v, then CBc(u)(v)) pushes an node into Q. ifv already exists in Q, p and d will be updated only when thenew tentative cost is smaller, or when the cost is the same butthe new tentative parent has a smaller router ID. The functionExtractMin(Q) returns (and deletes from Q) the node whichhas the smallest tentative cost in Q, where the router ID isused as a tie breaker in case of ties. In DMPA, when a node vis extracted, its tentative cost equals the smallest cost from cto v, so it can be added to the shortest path tree, and v:visitedwill be changed from false to true.

    DMPA also uses two hash tables to facilitate its efficientincremental operation. The first one, h, remembers whether the

    nodes can contribute to each other till the latest computation, sothat h[u; v] = 1 if and only if u can contribute to v and B(u) 6=B(v). 4 This is to avoid repeated calculations of equation (2).Since different nodes may have the same best next-hop, anotherhash table, hn, further records how many nodes contributetheir best next-hop b to N(v), the next-hop set of v. Thisreference count is denoted as hn[v; b], and when any link stateupdate changes the contribution relation between u and v dueto equation (2), the reference count is updated, but the next-hop set N(v) is modified only when necessary, as abstractedas Add(b;N(v)) and Del(b;N(v)).

    DMPA has two versions, a static one for a full computationfor a given topology, and a dynamic one for an incremen-tal update assuming the full computation has already beendone before. We use DMPA-f (Algorithm 1) and DMPA-ito distinguish them, and use a variable dynamic to identifythe case when necessary. DMPA-i also handles two cases,corresponding to a link cost increase (Algorithm 3) or decrease(Algorithm 4) respectively. They all use a common procedureComputeNextHopSets (Algorithm 2) to update the set of next-hops for each destination.

    Algorithm 1 DMPA-f1: dynamic false2: for v 2 V do3: C(v) 1;H(v) ;; P (v) nil;N(v) ;4: v:visited false5: Enqueue(Q;< c; nil; 0 >)6: ComputeNextHopSets

    1) Static Version: To build the SPT and compute the next-hop sets from scratch, DMPA-f first put the computing node cinto the priority queue Q, then goes through several iterations.In each iteration, a node v with the smallest tentative cost willbe popped out of Q by the ExactMin function and added tothe tree (lines 8-11). 5 The best next-hop for v is then updatedaccording to equation (1) (lines 14-17). For each neighbor uof v in the network, if the path from c to v to u leads to asmaller cost of u than previously known, the algorithm willupdate Q using the Enqueue function (lines 19-21). In thisway, more nodes will be included in Q and will be selectedto add to the tree later. At last, it will check whether u andv can contribute to each other according to equation (2), andupdate their next-hop sets if necessary (lines 24-35).

    2) Dynamic Version When Link Cost Increases: The dy-namic cases are more tricky. When a link `=< s; e > increasesits cost by inc, where s and e are the two ends of this link, `may either be in the previously constructed shortest path treeor not. Algorithm 3 illustrates the detailed procedure.

    In the former case, the tree structure will not be affected,neither will the cost C(v) of any node v. So according tothe rules to construct the next-hop set, only N(s) and N(e)may change due to a new L(s; e) in equation (2). Considerwhether s contributes its best next-hop B(s) to N(e). If scontributes (or does not contribute) to e both before and afterthe change, N(e) will not be affected. Otherwise, since the

    4If B(u)=B(v), then N(v) already includes B(u), and there is no needto check whether equation (2) can be satisfied (lines 25-26).

    5In the dynamic version, v is actually attached to a new parent in the tree.

  • Algorithm 2 ComputeNextHopSets7: while Q is not empty do8: < v; p; d > ExtractMin(Q)9: H(p) = H(p) [ fvg, H(P (v)) = H(P (v))nfvg10: P (v) p; C(v) d11: v:visited true12: if dynamic = true then13: b B(v); N(v) ;14: if P (v) = c then15: B(v) v16: else17: B(v) B(p)18: for each neighbor u of v do19: newdist C(v) + L(v; u)20: if (u:visited=false) _ (newdist C(u)) then21: Enqueue(Q;< u; v; newdist >)22: if (dynamic = true) ^ (h[v; u] = true) then23: Del(b;N(u))24: if u:visited = true then25: if (B(u) = B(v)) then26: h[u; v] false; h[v; u] false27: else28: if u can contribute to v then29: Add(B(u); N(v)); h[u; v] true30: else31: h[u; v] false32: if v can contribute to u then33: Add(B(v); N(u)); h[v; u] true34: else35: h[v; u] false

    link cost increases, it is just not possible that s can contributeto e after the change but not before, so only when s can nolonger contribute to e does DMPA-i update N(e) accordinglyby the Del function, which implements reference counting aswe have mentioned before. Similarly, N(s) is updated if e canno longer contribute to s (lines 38-42 in Algorithm 3).

    In the latter case, link ` lies in the previously constructedSPT, and without loss of generality, let us assume s is theparent of e in the SPT. Since ` increases its cost, only thedescendents of e in the tree might be affected, while all othernodes will have their costs unchanged. So we detach from thetree all the descendents of e (including itself), denoted by R(lines 43-46). Then we reinsert them into the shortest path treein a way similar to the static version. We note that, descendentnodes that can immediately get a path no worse than beforethrough the remaining unaffected nodes must be put into Q,while other descendents can be handled later (lines 47-52). 6 Atlast, ComputeNextHopSets is called to rebuild the remainingtree and update the next-hop sets therein.

    However, the ComputeNextHopSets function works a littledifferent in the dynamic case than in the static one. For anynode v in R, its old best next-hop B(v) has to be saved forlater use, and its next-hop set N(v) has to be cleared (lines 12-13). Since node v may have contributed its old best next-hop

    6Instead, we can simply put all descendent nodes in R into Q, but that willmake Q larger and affect the performance.

    Algorithm 3 DMPA-i(link ` =< s; e >, increment inc)36: dynamic true37: L(s; e) L(s; e) + inc38: if (` 62 T ) ^ (s 6= c) ^ (e 6= c) then39: if (h[s; e] = true) ^ (s cannot contribute to e) then40: Del(B(s); N(e)); h[s; e] false41: if (h[e; s] = true) ^ (e cannot contribute to s) then42: Del(B(e); N(s)); h[e; s] false43: if ` 2 T then

    //assuming s is the parent of e in T44: R D(e)45: for v 2 R do46: v:visited false47: for v 2 R do48: for each neighbor u of v do49: if u:visited = true then50: newdist C(u) + L(u; v)51: if newdist C(v) + change then52: Enqueue(Q;< v; u; newdist >)

    53: ComputeNextHopSets

    Algorithm 4 DMPA-i(link ` =< s; e >, decrement dec)54: dynamic true55: L(s; e) L(s; e) dec

    //assuming C(e) C(s)56: if C(e) < C(s) + L(s; e) then57: if (h[s; e] = false) ^ (s can contribute to e) then58: Add(B(s); N(e)); h[s; e] true59: if (h[e; s] = false) ^ (e can contribute to s) then60: Add(B(e); N(s)); h[e; s] true61: else62: newdist C(s) + L(s; e)63: Enqueue(Q;< e; s; newdist >)64: ComputeNextHopSets

    b to a neighboring node u according to equation (2), we firstremove b from N(u) using the Del function which implementsreference counting (lines 22-23), and re-check whether u andv can contribute to each other later and update their next-hopsets accordingly (lines 24-35).

    3) Dynamic Version When Link Cost Decreases: Algorithm4 handles the case when a link ` =< s; e > decreases its costby dec. Without loss of generality, we assume the cost of eis no smaller than s before the change, i.e., C(e) C(s), soneither ss position in the SPT nor its cost will change.

    If C(e) < C(s)+L(s; e), that is, e cannot get a smaller costdue to this change, the tree structure, as well as the costs of thenodes, will remain unchanged. Then only N(s) and N(e) maychange due to a new L(s; e) in equation (2). If s contributes(or does not contribute) to e both before and after the change,N(e) will not be affected. Otherwise, it is just impossible thats can contribute to e before the change but not after, so onlywhen s starts to contribute to e due to the change does DMPA-iupdate N(e) accordingly by the Add function (using referencecounting). Similarly, N(s) is updated if e starts to contributeto s. These are handled by lines 56-60 in Algorithm 4, andare exactly the opposite to the case when link cost increases

  • (lines 38-42) in Algorithm 3.

    Else, e can get a new (smaller) cost, so it is pushed intothe priority queue Q, and all nodes that will be affected willget their costs, parents and next-hop sets updated accordinglyusing ComputeNextHopSets (lines 61-64).

    C. Examples

    1) Static Version: Fig. 1 illustrates how a full computation(DMPA-f) is carried out, step by step from scratch, on node a.For each step, we show the constructed SPT and the next-hopsets which get updated in that step.

    In steps (2), (3) and (4), node b; c and d select the root aas their parent, so B(b)= b;B(c)= c and B(d)=d accordingto the best next-hop rule. Each best next-hop is also added tothe corresponding next-hop set.

    When node e is added to the tree with b as its parent instep (5), we get B(e) =B(b) = b. The next-hop contributionrule is employed to test whether e can contribute to any of itsneighbors (and vice versa). Since C(e)C(B(e))+L(e; c) =4 < C(c), e can contribute to c, so B(e) = b is added to N(c).Similarly, c can contribute to e and B(c)=c is added to N(e).Since d and e can also contribute to each other, B(d) = d isadded to N(e) and B(e)=b to N(d).

    2) Dynamic Version: We next show how the next-hops canbe incrementally updated by DMPA-i in response to a linkcost increase, using a more complex example as depicted inFig. 2. The shortest path trees are composed of the solid arrowlines (from parent to child), and the number inside each circle(node) represents the shortest distance from the source node ato that node. The dotted lines represent the other direct links,while a link that increases its cost is differentiated by a thick(solid or dotted) line from links unchanged.

    Fig. 2(a) is the SPT constructed before any change, togetherwith the computed next-hop sets and the contribution status(denoted by h for simplicity) which will be affected later. Forexample, since e can contribute to j, B(e) = b is added toN(j), and h[e; j] = true.

    Fig. 2(b) shows how DMPA-i works when the link increases its weight from 9 to 11, while from 7 to 9.First, consider . Since is not in the old SPT,no node changes its cost, and only l and m need to considerupdating their next-hop sets. However, h[m; l] = h[l;m] =false, which means they do not contribute to each other beforethe change, so they wont after the increase either, 7 and theirnext-hop sets remain unchanged.

    Now consider link < e; j >, which is not in the oldSPT either. The difference from the previous case is that,h[e; j] = h[j; e] = true, so we have to check whether e and jcan contribute to each other after the increase, using equation(2). Since C(e) C(B(e)) + L(e; j) = 15 > C(j), e cannotcontribute to j anymore, and B(e) = b has to be removedfrom N(j), and h[e; j] is set to be false.

    Finally, consider changing its cost from 4 to 20,and Fig. 2(c) shows the new SPT after this change. Because

    7This property is guaranteed to be true, so there is no need to use equation(2) to check, as suggested by lines 38-42 in Algorithm 3.

    link < c; g > lies in the old SPT in Fig. 2(a), lines 43-53in Algorithm 3 will be executed. All old descendents of g,i.e., g; j; k; l;m and n in D(g) will re-compute their next-hopsfrom scratch, while their neighboring nodes (not in D(g)), i.e.,e; f and i with a thick circle in the figure, only incrementallyupdate their next-hops, in the ComputeNextHopSets function.

    D. Multiple Link Cost Changes

    The algorithms presented above only handle a single linkcost change, but they can be extended to handle multiple linkcost changes in a batch. For clarity, we do not present thedetailed algorithms, but only give some briefly explanation.

    For a batch of link cost increases, lines 37-42 will beexecuted for each varying link that is not in the old SPT.Then lines 44-52 will be executed to find nodes affected bythe remaining increases and push them into the priority queueQ. Finally ComputeNextHops will be called to update theirnext-hop sets in a batch.

    Similarly, for a batch of link cost decreases, lines 55-60 willbe executed for each varying link which does not change thetree structure. Then lines 62-63 will be executed for the othercost changes, and finally ComputeNextHops will be called.

    E. Algorithm Complexity

    In this section, we discuss the time complexity of DMPA.We first analyze the ComputeNextHopSets function, sincemost work is done there.

    A priority queue is used to maintain nodes that need toupdate their positions (parents) and costs, and ComputeNex-tHopSets keeps adding nodes to, extracting nodes from, andupdating nodes in that queue. Let D denote the maximumnode degree, Nn the number of nodes that need to changetheir positions or costs when a link cost changes, and Nethe number of edges that may cause any node in the queueto change its cost (This is implemented by the decrease-keyoperation of the priority queue). Assume the time needed byEnqueue to enqueue a node is Te, the time by ExtractMinto extract the node with the minimum cost is Tx, and thetime by Enqueue (decrease-key) to update a node existingin the queue is Tk. Since each of the Nn nodes has to beenqueued and extracted exactly once, and each of the Needges can cause at most one decrease-key operation, the totalqueue operation time in all execution of ComputeNextHopSetsis at most O(Nn Te + Nn Tx + Ne Tk). Beside queuemanipulations, some operations are called at most two timesfor each of the Ne edges, including modifying the next-hopsets and updating the hash tables, while others are called atmost once for each of the Nn nodes to set their attributes.Each of them can be completed in constant time (or constantamortized time), and they cost O(Nn + Ne) in total. So thefinal time for all execution of ComputeNextHopSets is stillO(Nn Te +Nn Tx +Ne Tk).

    Since there are at most Nn nodes in the queue, Te =O(1); Tx = lg(Nn) and Tk = O(1) when the queue isimplemented as a Fibonacci heap [22], and the total time is atmost O(Nn lg(Nn) +Ne).

    Now consider the static algorithm (Algorithm 1). Theinitialization part costs at most O(V ) time in a network of

  • DB(b)=bN(b)={b}

    B(c)=cN(c)={c}

    B(d)=dN(d)={d}

    B(e)=bN(e)={b,c,d}

    N(d)={d,b}

    N(c)={c,b}

    E

    F

    HG

    D D

    E

    D

    E F

    D

    E F G

    D

    E F G

    H

    Fig. 1: Step by step construction of the SPT rooted at node a and the next-hop sets

    D

    E F G

    H

    I J

    M

    LK

    O P Q

    N1M ^FE`K>HM@ WUXHK>JM@ IDOVH

    1H ^EF`K>MH@ WUXH

    1I ^EF`K>MI@ WUXH

    1O ^F`K>PO@ IDOVH

    1P ^F`K>OP@ IDOVH

    1L ^GF`K>NL@ WUXH

    1N ^FG`K>LN@ WUXH

    1Q ^F`

    1J ^F`K>>NJ@ IDOVH

    (a) SPT computed before any change

    D

    E F G

    H

    I J

    M

    LK

    O P Q

    N

    !

    !

    1M ^F`K>HM@ IDOVH

    (b) < e; j >, < l;m > increase their costs

    D

    E F G

    H

    I J

    M

    LK

    O P Q

    N1M ^EF`K>HM@ IDOVHK>JM@ WUXH

    1H ^E`+>MH@ IDOVH

    1I ^E`K>MI@ IDOVH

    1O ^E`K>PO@ IDOVH

    1P ^E`K>OP@ IDOVH

    ! 1L ^G`K>NL@ IDOVH

    1N ^G`K>LN@ IDOVH

    1Q ^G`

    1J ^FG`K>NJ@ WUXH

    (c) < c; g > increases its cost

    Fig. 2: Incremental Update using DMPA-i (Algorithm 3) when link cost increases

    jV j nodes and jEj edges. Since Nn = jV j and Ne = jEj,Algorithm 1 costs at most O(jV j) +O(Nn lg(Nn) +Ne) =O(jV j lg(jV j) + jEj), which is similar to the complexity ofa full shortest path computation.

    For a dynamic computation based on Algorithm 3 or4, at most Nn nodes and Ne = D Nn edges of thesenodes will be involved. The operations executed besidesComputeNextHopSets are called at most two times for eachsuch edge, so the time needed for these parts is at mostO(D Nn). Combining it with the queue manipulation timeO(Nn lg(Nn) +Ne), we can derive the time complexity ofDMPA-i as O(Nn lg(Nn) +D Nn).

    IV. PERFORMANCE EVALUATION

    In this section, we present our evaluation methods as wellas results on both real and synthetic topologies.

    We compare the results achieved by DMPA against ECMP,the basic routing deflection scheme (denoted by Rule1) in[2], and TBFH [9]. Rule1 implements the DC rule (Theorem1), and requires SPT computation for each neighbor, 8 whileDMPA and TBFH use more strict rules than Rule1.

    We have implemented DMPA, ECMP and Rule1 using c,and performed experiments on linux with an Intel i5 1.7 GHz

    8Other schemes in [2] are even more time-consuming.

    CPU and 4G memory. For TBFH, we use the numeric resultsprovide in the original paper [9] when appropriate. 9

    For each comparison, we use the real topology of Abilene(a US-based research and education network with 11 nodes,14 edges), and four ISP topologies inferred from measurementresults by Rocketfuel [23], including Sprint (52 nodes, 84edges), Exodus (79 nodes, 294 edges), Telstra (104 nodes,302 edges), and Tiscali (161 nodes, 656 edges). Due to spacelimitations, in most of the time, we only present the resultsfor Sprint and Exodus, while the results for Telstra and Tiscaliare similar. We also use BRITE [24] to generate synthetictopologies of a large range of topology sizes and node degreedistributions, using parameters as listed in Table II.

    TABLE II: Parameters for BRITE Topology Generation

    Model N HS LSWaxman 20-1000 1000 100

    m NodePlacement GrowthTypem alpha2-40 Random Incremental 0.15

    beta BWDist BwMin BwMax0.2 Constant 10.0 1024.0

    9We are still in the progress of porting the code of TBFH and testingits performance under the same setting as we use in this paper. However,we believe directly using the results in [9] is meaningful, because [9] alsocompares TBFH with ECMP and Rule1, and the results for ECMP and Rule1there closely match our results.

  • 0 50 100 150 200

    100

    101

    102

    103

    Topology Size

    Com

    putin

    g Ti

    me

    (us)

    ECMPRule1DMPAfDMPAi

    (a) Average Node Degree=4

    0 10 20 30 40

    101

    102

    103

    104

    Average Degree of The Topology

    Com

    putin

    g Ti

    me

    (us)

    ECMPRule1DMPAfDMPAi

    (b) Topology Size=200

    0 10 20 30 40

    102

    104

    106

    Average Degree of The Topology

    Com

    putin

    g Ti

    me

    (us)

    ECMPRule1DMPAfDMPAi

    (c) Topology Size=1000

    Fig. 3: Computing Time vs Topology Size and Average Node Degree

    A. Computing Time

    First, we compare the computation efficiency of the dif-ferent algorithms. For each network topology, we let one linkchange its cost at a time and execute each algorithm on eachnode. This procedure is repeated for a random selected 30%links, and the final result for each algorithm is its computationtime averaged on all nodes and all selected link changes. Theresults on the real or inferred topologies are listed in table III,while the results on the synthetic topologies are shown in Fig.3. Note that DMPA-i dynamically updates its next-hops, whileall other algorithms have to recompute from scratch, includingDMPA-f, the static version of our algorithm.

    Our dynamic algorithm DMPA-i has a clear advantage inall cases, running nearly an order of magnitude faster than allthe other algorithms. For example, on average, DMPA-i usesonly 0.65s to handle a link cost change in the Sprint network,while the fastest among the others, ECMP, uses 7.51s.

    Fig. 3(a) shows how their computing time increases withthe topology size, using synthetic topologies with an averagenode degree of four, while fig. 3(b) and 3(c) show their timewith respect to larger average node degrees, under synthetictopologies of 200 and 1000 nodes, respectively. We can see thespeed of DMPA-f is comparable to that of ECMP in all cases,demonstrating that we are simply constructing a single SPTand computing multiple next-hops based on this tree. Actually,on average, DMPA-f is 20% slower than ECMP, while theresults in [9] show that TBFH is around 50% slower thanECMP, indicating DMPA-f is 20% faster than TBFH. 10 Rule1runs much slower than them, especially when the average nodedegree is large, since it construct a tree for each neighbor, whileour dynamic DMPA-i runs nearly an order of magnitude fasterthan all of them, since in most cases only a small portion of thetree needs to be adjusted. So in the face of topology changes,DMPA-i consumes much less computing resources, which isalready scarce for todays core routers.

    10The time cost by ECMP in these figures is similar to that in [9], so canbe used as a reference point.

    TABLE III: Computation time for Real Topologies

    NetworkComputing time(s)

    ECMP Rule1 DMPA-f DMPA-iReal Abilene 6.82 7.27 6.82 0.32

    MeasuredSprint 7.51 7.96 7.55 0.65Exodus 44.36 128.29 61.23 7.12

    B. Reliability

    One primary motivation of multipath routing is to provideredundant and diverse paths, so that when any link fails, a newpath avoiding this link can quickly be found to improve thenetwork reliability. To demonstrate this capability, we definedisconnect fraction as the ratio of the number of disconnectedsource-destination pairs to the number of all source-destinationpairs, when each link fails independently with a probability p.Here, for a certain routing algorithm, connected means thereexists a forwarding path from the source to the destination,using any next-hop computed by this algorithm. A smallerdisconnect fraction means better reliability.

    Fig. 4 shows the disconnect fraction achieved by eachalgorithm on Abilene, Sprint and Exodus. Since the static anddynamic version of DMPA output exactly the same results,we do not distinguish them here. As the link fail probability pincreases from 0.01 to 0.1, the reliability of ECMP decreasesvery fast. DMPA achieve a slightly larger (but very close)disconnect fraction than Rule1, since it uses a slightly morestrict rule than Rule1. For example, when p = 0:1, thedisconnect fraction of ECMP, Rule1 and DMPA in Exodusis 91.66%, 35.23% and 30.23% respectively. As shown in [9],TBFH also achieves a slighter worse (but comparable) resultthan Rule1, since it also uses a more strict rule. 11

    The disconnect fraction results on synthetic topologies ofdifferent sizes are shown in Fig. 5. The curves have a similartrend to those in Fig. 4, where ECMPs reliability decreases

    11In [9], a metric called coverage is used to measure the path diversity,which is defined as the number of s-d pairs with at least one valid alternatenext-hop on the source to the number of all s-d pairs. Although that doesnot guarantee a valid path, we believe it is reasonable to conjecture that, twoalgorithms that achieve similar coverage also achieve similar disconnectfraction.

  • 0.02 0.04 0.06 0.08 0.10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (a) Abilene

    0.02 0.04 0.06 0.08 0.10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (b) Sprint

    0.02 0.04 0.06 0.08 0.10

    0.2

    0.4

    0.6

    0.8

    1

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (c) Exodus

    Fig. 4: Reliability on Abilene, Sprint and Exodus

    50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology Size

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (a) p=0.01 and Average Node Degree=4

    50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology Size

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (b) p=0.05 and Average Node Degree=4

    50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology SizeD

    isco

    nnec

    t Fra

    ctio

    n

    ECMPRule1DMPA

    (c) p=0.1 and Average Node Degree=4

    Fig. 5: Reliability for Different Topology Sizes

    much faster on larger topologies, while Rule1 and DMPA canfind much more redundant paths and provide better reliability.

    We also investigate how the reliability changes when theaverage node degree of a topology increases, as shown in Fig.6, using synthetic topologies with 200 nodes. It is clear thatwhen nodes have more neighbors, the disconnection fractionalso decreases. Rule1 and DMPA still provide comparableperformance, both of which are much better than ECMP.

    These results suggest that, our dynamic algorithm canprovide reliability close to the best one, while running nearlyan order of magnitude faster than all the other algorithms.

    C. Fast Recovery

    The disconnect fraction results in the previous section showthe theoretically best reliability a multipath algorithm canachieve. However, in practice, even a path to the destinationexist when some links fail, it may not be effectively used, sinceeach router selects its next-hop independently, not necessarilyalong any path that can reach the destination. So in this section,we use a simple forwarding scheme to test how well thesealgorithms can work in a real-world.

    Assuming on each node, the next-hop set is computed for

    a topology. When some links fail, we let the nodes enter arecovery mode, and each randomly choose one next-hop fromits next-hop set. Such a scheme can be easily implementedby the end system embedding a control bit in the packet,without explicitly coordinating all routers. We call the processof forwarding packets along the randomly built path untileither the destination is reached or no next-hop is availablea trial. A source-destination pair is considered to be connectedwhen the destination can be reached within five trials, and thecorresponding disconnect fraction results are shown in Fig.7 and Fig. 8. Due to space limitations, synthetic topologieswith different average node degrees are not included, but allresults are very similar to those in the previous section. Thisindicates that a simple forwarding scheme can effectively letDMPA achieve reasonable reliability.

    D. Partial Deployment

    DMPA is compatible with nowadays link-state routingprotocol, such as OSPF and IS-IS [25], so it can be partial-ly deployed on only a portion of nodes. However, startingwith the right nodes for deployment may result in differentnetwork reliability. We test three simple strategies, namely,selecting the highest degree nodes, the lowest degree nodes,or random nodes. In Fig. 9, we present the disconnection

  • 10 20 30 400

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    Average Degree of The Topology

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (a) p=0.01 and Topology Size=200

    10 20 30 400

    0.05

    0.1

    0.15

    0.2

    Average Degree of The Topology

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (b) p=0.05 and Topology Size=200

    10 20 30 400

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    Average Degree of The Topology

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (c) p=0.1 and Topology Size=200

    Fig. 6: Reliability for Different Average Node Degrees

    0.02 0.04 0.06 0.08 0.10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (a) Abilence

    0.02 0.04 0.06 0.08 0.10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (b) Sprint

    0.02 0.04 0.06 0.08 0.10

    0.2

    0.4

    0.6

    0.8

    1

    Probability of Link FailureD

    isco

    nnec

    t Fra

    ctio

    n

    ECMPRule1DMPA

    (c) Exodus

    Fig. 7: Recovery Results on Abilene, Sprint and Exodus

    fraction achieved on the Exodus topology. It can be seen thatstarting with the high degree nodes is the best cost-effective,when a deployment on only 20% nodes already reduces thedisconnect fraction clearly. Results on other topologies givesimilar indications, and are omitted due to space limitations.This is reasonable, since high degree nodes may find morenext-hops to the destinations.

    V. CONCLUSION

    In this paper, we propose a shortest path tree based multi-path routing algorithm called DMPA. We carefully define thenext-hop contribution rule for computing multiple next-hops,and prove that no loop will be introduced when this distributealgorithm is executed independently on each router. DMPAnot only avoids the overhead of computing multiple shortestpath trees, but also dynamically handles the link state changes,so that the next-hops can be incrementally updated, but notrecomputed from scratch. In this way, it runs much faster thanthe other multipath algorithms, and consumes little computingresource which is scarce on todays routers. DMPA effectivelyincreases the network reliability. It can help fast recovery witha simple forwarding scheme, and can be partially deployed inthe network. We believe DMPA provides a basic mechanismon which the network can be made more efficient and reliable.

    VI. ACKNOWLEDGE

    We are grateful to the anonymous ICNP reviewers fortheir insightful comments.. This work is supported by theNational Basic Research Program of China (973 Program)under Grant No. 2009CB320502 and the National NaturalScience Foundation of China (Grant No. 61272446).

    REFERENCES

    [1] G. Iannaccone, C. nee Chuah, R. Mortier, S. Bhattacharyya, and C. Diot,Analysis of link failures in an ip backbone, in In Proc. of the InternetMeasurement Workshop. ACM, 2002, pp. 237242.

    [2] X. Yang and D. Wetherall, Source selectable path diversity via routingdeflections. in SIGCOMM. ACM, 2006, pp. 159170.

    [3] M. Motiwala, M. Elmore, N. Feamster, and S. Vempala, Path splicing,in SIGCOMM, 2008, pp. 2738.

    [4] J. Chen, S. Chan, and V. Li, Multipath routing for video delivery overbandwidth-limited networks, Selected Areas in Communications, IEEEJournal on, vol. 22, no. 10, pp. 19201932, 2004.

    [5] S. Vutukury and J. Garcia-Luna-Aceves, Mpath: a loop-free multipathrouting algorithm, Microprocessors and Microsystems, vol. 24, no. 6,pp. 319327, 2000.

    [6] J. He and J. Rexford, Toward internet-wide multipath routing, Net-work, IEEE, vol. 22, no. 2, pp. 1621, 2008.

  • 50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology Size

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (a) pe = 0:01 and Average Node Degree=4

    50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology Size

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (b) pe = 0:05 and Average Node Degree=4

    50 100 150 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    Topology Size

    Dis

    conn

    ect F

    ract

    ion

    ECMPRule1DMPA

    (c) pe = 0:1 and Average Node Degree=4

    Fig. 8: Recovery Results for Different Topology Sizes

    0 0.02 0.04 0.06 0.08 0.10

    0.2

    0.4

    0.6

    0.8

    1

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPDMPA(10%)DMPA(20%)DMPA(50%)DMPA(100%)

    (a) Low Node Degree

    0 0.02 0.04 0.06 0.08 0.10

    0.2

    0.4

    0.6

    0.8

    1

    Probability of Link Failure

    Dis

    conn

    ect F

    ract

    ion

    ECMPDMPA(10%)DMPA(20%)DMPA(50%)DMPA(100%)

    (b) Random

    0 0.02 0.04 0.06 0.08 0.10

    0.2

    0.4

    0.6

    0.8

    1

    Probability of Link FailureD

    isco

    nnec

    t Fra

    ctio

    n

    ECMPDMPA(10%)DMPA(20%)DMPA(50%)DMPA(100%)

    (c) High Node Degree

    Fig. 9: Incremental Deployment for the Exodus Topology.

    [7] S. Gjessing, Implementation of two resilience mechanisms using multitopology routing and stub routers, in Telecommunications, 2006. AICT-ICIW06. IEEE, 2006, pp. 2929.

    [8] G. Apostolopoulos, Using multiple topologies for ip-only protectionagainst network failures: A routing performance perspective, ICS-FORTH, Greece, Tech. Rep, 2006.

    [9] P. Merindol, P. Francois, O. Bonaventure, S. Cateloin, and J. J. Pansiot,An efficient algorithm to enable path diversity in link state routingnetworks, Comput. Netw., vol. 55, no. 5, pp. 11321149, Apr. 2011.

    [10] J. Moy, Rfc 2328: Ospf version 2, Internet Society (ISOC), 1998.[11] G. Lee and J. Choi, A survey of multipath routing for traffic engineer-

    ing, Information and Communications University, Korea, 2002.[12] D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris, Resilient

    overlay networks, ACM SIGCOMM Computer Communication Review,vol. 32, no. 1, pp. 6666, 2002.

    [13] H. Palakurthi, Study of multipath routing for qos provisioning, EECS,2001.

    [14] P. M. Spira and A. Pan, On finding and updating spanning trees andshortest paths, SIAM Journal on Computing, vol. 4, no. 3, pp. 375380,1975.

    [15] P. G. Franciosa, D. Frigioni, and R. Giaccio, Semi-dynamic shortestpaths and breadth-first search in digraphs, in STACS 97. Springer,1997, pp. 3346.

    [16] P. Narvaez, K. Siu, and H. Tzeng, New dynamic algorithms for shortestpath tree computation, IEEE/ACM Transactions on Networking (TON),vol. 8, no. 6, pp. 734746, 2000.

    [17] , New dynamic spt algorithm based on a ball-and-string model,

    IEEE/ACM Transactions on Networking (TON), vol. 9, no. 6, pp. 706718, 2001.

    [18] P. Narvaez, K.-Y. Siu, and H.-Y. Tzeng, Efficient algorithms for multi-path link-state routing, in ISCOM99, 1999.

    [19] J. T. Moy, OSPF: anatomy of an Internet routing protocol. Addison-Wesley Professional, 1998.

    [20] A. INCITS, Iso/iec 10589, Information technologyTelecommunications and information exchange between systemsIntermediate System to intermediate system intra-domain routeinginformation exchange protocol for use in conjunction with the protocolfor providing the connectionless-mode network service (ISO 8473),2002.

    [21] E. W. Dijkstra, A note on two problems in connexion with graphs,Numerische mathematik, vol. 1, no. 1, pp. 269271, 1959.

    [22] M. L. Fredman and R. Tarjan, Fibonacci heaps and their uses inimproved network optimization algorithms, 25th Annual Symposiumon Foundations of Computer Science (IEEE), pp. 338346, 1984.

    [23] N. Spring, R. Mahajan, and D. Wetherall, Measuring isp topologieswith rocketfuel, ACM SIGCOMM Computer Communication Review,vol. 32, no. 4, pp. 133145, 2002.

    [24] A. Medina, A. Lakhina, I. Matta, and J. Byers, Brite: An approach touniversal topology generation, in Modeling, Analysis and Simulationof Computer and Telecommunication Systems, 2001. Proceedings. NinthInternational Symposium on. IEEE, 2001, pp. 346353.

    [25] R. Perlman, A comparison between two routing protocols: Ospf andis-is, Network, IEEE, vol. 5, no. 5, pp. 1824, 1991.