multiple routing configurations for fast

14
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009 473 Multiple Routing Configurations for Fast IP Network Recovery Amund Kvalbein, Member, IEEE, Audun Fosselie Hansen, Tarik ˇ Ciˇ cic ´ , Member, IEEE, Stein Gjessing, Member, IEEE, and Olav Lysne, Member, IEEE Abstract—As the Internet takes an increasingly central role in our communications infrastructure, the slow convergence of routing protocols after a network failure becomes a growing problem. To assure fast recovery from link and node failures in IP networks, we present a new recovery scheme called Multiple Routing Configurations (MRC). Our proposed scheme guarantees recovery in all single failure scenarios, using a single mechanism to handle both link and node failures, and without knowing the root cause of the failure. MRC is strictly connectionless, and assumes only destination based hop-by-hop forwarding. MRC is based on keeping additional routing information in the routers, and allows packet forwarding to continue on an alternative output link imme- diately after the detection of a failure. It can be implemented with only minor changes to existing solutions. In this paper we present MRC, and analyze its performance with respect to scalability, backup path lengths, and load distribution after a failure. We also show how an estimate of the traffic demands in the network can be used to improve the distribution of the recovered traffic, and thus reduce the chances of congestion when MRC is used. Index Terms—Availability, computer network reliability, com- munication system fault tolerance, communication system routing, protection. I. INTRODUCTION I N recent years the Internet has been transformed from a spe- cial purpose network to an ubiquitous platform for a wide range of everyday communication services. The demands on In- ternet reliability and availability have increased accordingly. A disruption of a link in central parts of a network has the potential to affect hundreds of thousands of phone conversations or TCP connections, with obvious adverse effects. The ability to recover from failures has always been a central design goal in the Internet [1]. IP networks are intrinsically ro- bust, since IGP routing protocols like OSPF are designed to up- date the forwarding information based on the changed topology after a failure. This re-convergence assumes full distribution of the new link state to all routers in the network domain. When the new state information is distributed, each router individu- ally calculates new valid routing tables. Manuscript received December 21, 2006; revised July 20, 2007 and February 06, 2008; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor J. Yates. First published July 25, 2008; current version published April 15, 2009. A. Kvalbein and A. F. Hansen are with Simula Research Laboratory, 1367 Lysaker, Norway (e-mail: [email protected]). T. ˇ Ciˇ cic ´ was with Simula Research Laboratory, 1367 Lysaker, Norway. He is now with Media Network Services, Norway, and also with the Department of Informatics, University of Oslo, 1371 Oslo, Norway. S. Gjessing and O. Lysne are with Simula Research Laboratory, 1367 Lysaker, Norway, and also with the University of Oslo, 0371 Oslo, Norway. Digital Object Identifier 10.1109/TNET.2008.926507 This network-wide IP re-convergence is a time consuming process, and a link or node failure is typically followed by a period of routing instability. During this period, packets may be dropped due to invalid routes. This phenomenon has been studied in both IGP [2] and BGP context [3], and has an ad- verse effect on real-time applications [4]. Events leading to a re-convergence have been shown to occur frequently [5]. Much effort has been devoted to optimizing the different steps of the convergence of IP routing, i.e., detection, dissemination of information and shortest path calculation, but the convergence time is still too large for applications with real time demands [6]. A key problem is that since most network failures are short lived [7], too rapid triggering of the re-convergence process can cause route flapping and increased network instability [2]. The IGP convergence process is slow because it is reactive and global. It reacts to a failure after it has happened, and it in- volves all the routers in the domain. In this paper we present a new scheme for handling link and node failures in IP networks. Multiple Routing Configurations (MRC) is a proactive and local protection mechanism that allows recovery in the range of mil- liseconds. MRC allows packet forwarding to continue over pre- configured alternative next-hops immediately after the detection of the failure. Using MRC as a first line of defense against net- work failures, the normal IP convergence process can be put on hold. This process is then initiated only as a consequence of non-transient failures. Since no global re-routing is performed, fast failure detection mechanisms like fast hellos or hardware alerts can be used to trigger MRC without compromising net- work stability [8]. MRC guarantees recovery from any single link or node failure, which constitutes a large majority of the failures experienced in a network [7]. MRC makes no assump- tions with respect to the root cause of failure, e.g., whether the packet forwarding is disrupted due to a failed link or a failed router. The main idea of MRC is to use the network graph and the as- sociated link weights to produce a small set of backup network configurations. The link weights in these backup configurations are manipulated so that for each link and node failure, and re- gardless of whether it is a link or node failure, the node that detects the failure can safely forward the incoming packets to- wards the destination on an alternate link. MRC assumes that the network uses shortest path routing and destination based hop-by-hop forwarding. The shifting of traffic to links bypassing the failure can lead to congestion and packet loss in parts of the network [9]. This limits the time that the proactive recovery scheme can be used to forward traffic before the global routing protocol is informed about the failure, and hence reduces the chance that a transient 1063-6692/$25.00 © 2008 IEEE

Upload: ieeexploreprojects

Post on 10-May-2015

689 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Multiple routing configurations for fast

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009 473

Multiple Routing Configurations for FastIP Network Recovery

Amund Kvalbein, Member, IEEE, Audun Fosselie Hansen, Tarik Cicic, Member, IEEE,Stein Gjessing, Member, IEEE, and Olav Lysne, Member, IEEE

Abstract—As the Internet takes an increasingly central rolein our communications infrastructure, the slow convergence ofrouting protocols after a network failure becomes a growingproblem. To assure fast recovery from link and node failures inIP networks, we present a new recovery scheme called MultipleRouting Configurations (MRC). Our proposed scheme guaranteesrecovery in all single failure scenarios, using a single mechanism tohandle both link and node failures, and without knowing the rootcause of the failure. MRC is strictly connectionless, and assumesonly destination based hop-by-hop forwarding. MRC is based onkeeping additional routing information in the routers, and allowspacket forwarding to continue on an alternative output link imme-diately after the detection of a failure. It can be implemented withonly minor changes to existing solutions. In this paper we presentMRC, and analyze its performance with respect to scalability,backup path lengths, and load distribution after a failure. We alsoshow how an estimate of the traffic demands in the network canbe used to improve the distribution of the recovered traffic, andthus reduce the chances of congestion when MRC is used.

Index Terms—Availability, computer network reliability, com-munication system fault tolerance, communication system routing,protection.

I. INTRODUCTION

I N recent years the Internet has been transformed from a spe-cial purpose network to an ubiquitous platform for a wide

range of everyday communication services. The demands on In-ternet reliability and availability have increased accordingly. Adisruption of a link in central parts of a network has the potentialto affect hundreds of thousands of phone conversations or TCPconnections, with obvious adverse effects.

The ability to recover from failures has always been a centraldesign goal in the Internet [1]. IP networks are intrinsically ro-bust, since IGP routing protocols like OSPF are designed to up-date the forwarding information based on the changed topologyafter a failure. This re-convergence assumes full distribution ofthe new link state to all routers in the network domain. Whenthe new state information is distributed, each router individu-ally calculates new valid routing tables.

Manuscript received December 21, 2006; revised July 20, 2007 and February06, 2008; approved by IEEE/ACM TRANSACTIONS ON NETWORKING EditorJ. Yates. First published July 25, 2008; current version published April 15, 2009.

A. Kvalbein and A. F. Hansen are with Simula Research Laboratory, 1367Lysaker, Norway (e-mail: [email protected]).

T. Cicic was with Simula Research Laboratory, 1367 Lysaker, Norway. He isnow with Media Network Services, Norway, and also with the Department ofInformatics, University of Oslo, 1371 Oslo, Norway.

S. Gjessing and O. Lysne are with Simula Research Laboratory, 1367 Lysaker,Norway, and also with the University of Oslo, 0371 Oslo, Norway.

Digital Object Identifier 10.1109/TNET.2008.926507

This network-wide IP re-convergence is a time consumingprocess, and a link or node failure is typically followed by aperiod of routing instability. During this period, packets maybe dropped due to invalid routes. This phenomenon has beenstudied in both IGP [2] and BGP context [3], and has an ad-verse effect on real-time applications [4]. Events leading to are-convergence have been shown to occur frequently [5].

Much effort has been devoted to optimizing the different stepsof the convergence of IP routing, i.e., detection, dissemination ofinformation and shortest path calculation, but the convergencetime is still too large for applications with real time demands[6]. A key problem is that since most network failures are shortlived [7], too rapid triggering of the re-convergence process cancause route flapping and increased network instability [2].

The IGP convergence process is slow because it is reactiveand global. It reacts to a failure after it has happened, and it in-volves all the routers in the domain. In this paper we present anew scheme for handling link and node failures in IP networks.Multiple Routing Configurations (MRC) is a proactive and localprotection mechanism that allows recovery in the range of mil-liseconds. MRC allows packet forwarding to continue over pre-configured alternative next-hops immediately after the detectionof the failure. Using MRC as a first line of defense against net-work failures, the normal IP convergence process can be put onhold. This process is then initiated only as a consequence ofnon-transient failures. Since no global re-routing is performed,fast failure detection mechanisms like fast hellos or hardwarealerts can be used to trigger MRC without compromising net-work stability [8]. MRC guarantees recovery from any singlelink or node failure, which constitutes a large majority of thefailures experienced in a network [7]. MRC makes no assump-tions with respect to the root cause of failure, e.g., whether thepacket forwarding is disrupted due to a failed link or a failedrouter.

The main idea of MRC is to use the network graph and the as-sociated link weights to produce a small set of backup networkconfigurations. The link weights in these backup configurationsare manipulated so that for each link and node failure, and re-gardless of whether it is a link or node failure, the node thatdetects the failure can safely forward the incoming packets to-wards the destination on an alternate link. MRC assumes thatthe network uses shortest path routing and destination basedhop-by-hop forwarding.

The shifting of traffic to links bypassing the failure can leadto congestion and packet loss in parts of the network [9]. Thislimits the time that the proactive recovery scheme can be usedto forward traffic before the global routing protocol is informedabout the failure, and hence reduces the chance that a transient

1063-6692/$25.00 © 2008 IEEE

Page 2: Multiple routing configurations for fast

474 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

failure can be handled without a full global routing re-conver-gence. Ideally, a proactive recovery scheme should not onlyguarantee connectivity after a failure, but also do so in a mannerthat does not cause an unacceptable load distribution. This re-quirement has been noted as being one of the principal chal-lenges for precalculated IP recovery schemes [10]. With MRC,the link weights are set individually in each backup configu-ration. This gives great flexibility with respect to how the re-covered traffic is routed. The backup configuration used after afailure is selected based on the failure instance, and thus we canchoose link weights in the backup configurations that are wellsuited for only a subset of failure instances.

The rest of this paper is organized as follows. In Section II wedescribe the basic concepts and functionality of MRC. We thendefine MRC formally and present an algorithm used to create theneeded backup configurations in Section III. In Section IV, weexplain how the generated configurations can be used to forwardthe traffic safely to its destination in case of a failure. We presentperformance evaluations of the proposed method in Section V.In Section VI, we discuss how we can improve the recoverytraffic distribution if we have an estimate of the demands in thenetwork. In Section VII, we discuss related work, and finally weconclude in Section VIII.

II. MRC OVERVIEW

MRC is based on building a small set of backup routing con-figurations, that are used to route recovered traffic on alternatepaths after a failure. The backup configurations differ from thenormal routing configuration in that link weights are set so as toavoid routing traffic in certain parts of the network. We observethat if all links attached to a node are given sufficiently highlink weights, traffic will never be routed through that node. Thefailure of that node will then only affect traffic that is sourced ator destined for the node itself. Similarly, to exclude a link (or agroup of links) from taking part in the routing, we give it infiniteweight. The link can then fail without any consequences for thetraffic.

Our MRC approach is threefold. First, we create a set ofbackup configurations, so that every network component is ex-cluded from packet forwarding in one configuration. Second, foreach configuration, a standard routing algorithm like OSPF isused to calculate configuration specific shortest paths and createforwarding tables in each router, based on the configurations.The use of a standard routing algorithm guarantees loop-freeforwarding within one configuration. Finally, we design a for-warding process that takes advantage of the backup configura-tions to provide fast recovery from a component failure.

In our approach, we construct the backup configurations sothat for all links and nodes in the network, there is a config-uration where that link or node is not used to forward traffic.Thus, for any single link or node failure, there will exist a con-figuration that will route the traffic to its destination on a paththat avoids the failed element. Also, the backup configurationsmust be constructed so that all nodes are reachable in all con-figurations, i.e., there is a valid path with a finite cost betweeneach node pair. Shared Risk Groups can also be protected, by re-garding such a group as a single component that must be avoided

in a particular configuration. In Section III, we formally describeMRC and how to generate configurations that protect every linkand node in a network.

Using a standard shortest path calculation, each router createsa set of configuration-specific forwarding tables. For simplicity,we say that a packet is forwarded according to a configuration,meaning that it is forwarded using the forwarding table calcu-lated based on that configuration. In this paper we talk aboutbuilding a separate forwarding table for each configuration, butwe believe that more efficient solutions can be found in a prac-tical implementation.

When a router detects that a neighbor can no longer bereached through one of its interfaces, it does not immediatelyinform the rest of the network about the connectivity failure.Instead, packets that would normally be forwarded over thefailed interface are marked as belonging to a backup config-uration, and forwarded on an alternative interface towards itsdestination. The selection of the correct backup configuration,and thus also the backup next-hop, is detailed in Section IV.The packets must be marked with a configuration identifier, sothe routers along the path know which configuration to use.Packet marking is most easily done by using specific values inthe DSCP field in the IP header. If this is not possible, otherpacket marking strategies like IPv6 extension headers or usinga private address space and tunneling (as proposed in [11])could be used.

It is important to stress that MRC does not affect the failure-free original routing, i.e., when there is no failure, all packetsare forwarded according to the original configuration, where alllink weights are normal. Upon detection of a failure, only trafficreaching the failure will switch configuration. All other trafficis forwarded according to the original configuration as normal.

If a failure lasts for more than a specified time interval,a normal re-convergence will be triggered. MRC does notinterfere with this convergence process, or make it longer thannormal. However, MRC gives continuous packet forwardingduring the convergence, and hence makes it easier to usemechanisms that prevents micro-loops during convergence, atthe cost of longer convergence times [12]. If a failure is deemedpermanent, new configurations must be generated based on thealtered topology.

III. GENERATING BACKUP CONFIGURATIONS

In this section, we will first detail the requirements that mustbe put on the backup configurations used in MRC. Then, wepropose an algorithm that can be used to automatically createsuch configurations. The algorithm will typically be run once atthe initial start-up of the network, and each time a node or linkis permanently added or removed. We use the notation shown inTable I.

A. Configurations Structure

MRC configurations are defined by the network topology,which is the same in all configurations, and the associated linkweights, which differ among configurations. We formally repre-sent the network topology as a graph , with a set of

Page 3: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 475

TABLE INOTATION

nodes and a set of unidirectional links (arcs) .1 In order toguarantee single-fault tolerance, the topology graph must bebi-connected. A configuration is defined by this topology graphand the associated link weight function:

Definition: A configuration is an ordered pair ofthe graph and a functionthat assigns an integer weight to each link .

We distinguish between the normal configuration and thebackup configurations , . In the normal configuration,

, all links have “normal” weights .We assume that is given with finite integer weights. MRC isagnostic to the setting of these weights. In the backup configura-tions, selected links and nodes must not carry any transit traffic.Still, traffic must be able to depart from and reach all operativenodes. These traffic regulations are imposed by assigning highweights to some links in the backup configurations:

Definition: A link is isolated in if .Definition: A link is restricted in if .Isolated links do not carry any traffic. Restricted links are

used to isolate nodes from traffic forwarding. The restrictedlink weight must be set to a sufficiently high, finite valueto achieve that. Nodes are isolated by assigning at least the re-stricted link weight to all their attached links. For a node to bereachable, we cannot isolate all links attached to the node in thesame configuration. More than one node may be isolated in aconfiguration. The set of isolated nodes in is denoted , andthe set of normal (non-isolated) nodes .

Definition: A node is isolated in if

(1)

With MRC, restricted and isolated links are always attachedto isolated nodes as given by the following rules. For all links

,

(2)

(3)

This means that a restricted link always connects an isolatednode to a non-isolated node. An isolated link either connects

1We interchangeably use the notations � or ��� �� to denote a link, dependingon whether the endpoints of the link are important.

Fig. 1. Left: node 5 is isolated (shaded color) by setting a high weight on allits connected links (stapled). Only traffic to and from the isolated node will usethese restricted links. Right: a configuration where nodes 1, 4 and 5, and thelinks 1–2, 3–5 and 4–5 are isolated (dotted).

an isolated node to a non-isolated node, or it connects two iso-lated nodes. Importantly, this means that a link is always iso-lated in the same configuration as at least one of its attachednodes. These two rules are required by the MRC forwardingprocess described in Section IV in order to give correct for-warding without knowing the root cause of failure. When wetalk of a backup configuration, we refer to a configuration thatadheres to (2) and (3).

The purpose of the restricted links is to isolate a node fromrouting in a specific backup configuration , such as node 5 tothe left in Fig. 1. In many topologies, more than a single nodecan be isolated simultaneously. In the example to the right inFig. 1, three nodes and three links are isolated.

Restricted and isolated links are always given the sameweight in both directions. However, MRC treats links as unidi-rectional, and makes no assumptions with respect to symmetriclink weights for the links that are not restricted or isolated.Hence, MRC can co-exist with traffic engineering schemes thatrely on asymmetric link weights for load balancing purposes.

MRC guarantees single-fault tolerance by isolating each linkand node in exactly one backup configuration. In each config-uration, all node pairs must be connected by a finite cost paththat does not pass through an isolated node or an isolated link.A configuration that satisfies this requirement is called valid:

Definition: A configuration is valid if and only if

(4)

We observe that all backup configurations retain a charac-teristic internal structure, in that all isolated nodes are directlyconnected to a core of nodes connected by links with normalweights:

Definition: A configuration backbone ,consists of all non-isolated nodes in and all links that are

neither isolated nor restricted:

(5)

A backbone is connected if all nodes in are connected bypaths containing links with normal weights only:

Definition: A backbone is connected if and only if

(6)

An important invariant in our algorithm for creating backupconfigurations is that the backbone remains connected. Sinceall backup configurations must adhere to (2) and (3), we can

Page 4: Multiple routing configurations for fast

476 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

show that a backup configuration with a connected backbone isequivalent to a valid backup configuration:

Lemma 3.1: A backup configuration is valid if and only ifit contains a connected backbone.

Proof: We first show that a connected backbone impliesthat is valid. For each node pair and , zero, one or both of

and are in . Assume . From the definitionof an isolated node and (2),

. From (6) .Thus,

(7)

(8)

and (4) follows. A subset of the above is sufficient to show thesame if only one, or none, of is in .

For the converse implication, assume and node. From (4), and .

Since by (2) restricted links are always connected to at least oneisolated node, such links can not be part of , and alllinks in must have normal weights.

In backup configurations, transit traffic is constrained to theconfiguration backbone. A restricted link weight that is suffi-ciently high to achieve this can be determined from the numberof links in the network and the maximal normal link weight:

Proposition 3.2: Let be a node isolated in the valid backupconfiguration . Then, restricted link weight value

(9)

is sufficiently high to exclude from any shortest path inwhich does not start or end in .

Proof: Since all links attached to the isolated node havea weight of at least , the weight of a path through will be atleast . From the definition of an isolatednode and (2), all isolated nodes are directly connected to theconfiguration backbone. From (4), any shortest path in willbe entirely contained in , except possibly the first or the lasthop. A valid configuration contains a connected backbone, andthe total weight of the sub-path that is within will be at most

. Since , no shortest path will includeas the transit.

To guarantee recovery after any component failure, everynode and every link must be isolated in one backup configura-tion. Let be a set of backup configurations.We say that

Definition: A set, , of backup configurations is complete if

(10)

A complete set of valid backup configurations for a giventopology can be constructed in different ways. In the next sub-section we present an efficient algorithm for this purpose.

B. Algorithm

The number and internal structure of backup configurationsin a complete set for a given topology may vary dependingon the construction model. If more configurations are created,

fewer links and nodes need to be isolated per configuration,giving a richer (more connected) backbone in each configura-tion. On the other hand, if fewer configurations are constructed,the state requirement for the backup routing information storageis reduced. However, calculating the minimum number of con-figurations for a given topology graph is computationally de-manding. One solution would be to find all valid configurationsfor the input consisting of the topology graph and its associ-ated normal link weights , and then find the complete set ofconfigurations with lowest cardinality. Finding this set wouldinvolve solving the Set Cover problem, which is known to be

-complete [13].Instead we present a heuristic algorithm that attempts to make

all nodes and links in an arbitrary bi-connected topology iso-lated. Our algorithm takes as input the directed graph and thenumber of backup configurations that is intended created. Ifthe algorithm terminates successfully, its output is a completeset of valid backup configurations. The algorithm is agnostic tothe original link weights , and assigns new link weights onlyto restricted and isolated links in the backup configurations. Fora sufficiently high , the algorithm will always terminate suc-cessfully, as will be further discussed in Section III-B-3. Thisalgorithm isolates all nodes in the network, and hence requiresa bi-connected as input. Topologies where the failure of a singlenode disconnects the network can be processed by simply ig-noring such nodes, which are then left unprotected.

The algorithm can be implemented either in a network man-agement system, or in the routers. As long as all routers havethe same view of the network topology, they will compute thesame set of backup configurations.

1) Description: Algorithm 1 loops through all nodes in thetopology, and tries to isolate them one at a time. A link is isolatedin the same iteration as one of its attached nodes. The algorithmterminates when either all nodes and links in the network areisolated in exactly one configuration, or a node that cannot be

Page 5: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 477

isolated is encountered. We now specify the algorithm in detail,using the notation shown in Table I.

a) Main loop: Initially, backup configurations are cre-ated as copies of the normal configuration. A queue of nodes

and a queue of links are initiated. The node queuecontains all nodes in an arbitrary sequence. The link queue isinitially empty, but all links in the network will have to passthrough it. Method returns the first item in the queue, re-moving it from the queue.

When a node is attempted isolated in a backup configura-tion , it is first tested that doing so will not disconnectaccording to definition (6). The method at line 13decides this by testing that each of ’s neighbors can reach eachother without passing through , an isolated node, or an isolatedlink in configuration .

If the connectivity test is positive, function iscalled, which attempts to find a valid assignment of isolatedand restricted links for node as detailed below. If successful,

returns the modified configuration and the changesare committed (line 16). Otherwise no changes are made in .

If was successfully isolated, we move on to the next node.Otherwise, we keep trying to isolate in every configuration,until all configurations are tried (line 20). If could not beisolated in any configuration, a complete set of valid configura-tions with cardinality could not be built using our algorithm.The algorithm will then terminate with an unsuccessful result(line 22).

b) Isolating links: Along with , as many as possible of itsattached links are isolated. The algorithm runs through the links

attached to (lines 2–3 in function ). It can beshown that it is an invariant in our algorithm that in line 1, alllinks in are attached to node . The node in the other endof the link may or may not be isolated in some configurationalready (line 4). If it is, we must decide whether the link shouldbe isolated along with (line 7), or if it is already isolated in theconfiguration where is isolated (line 11). A link must alwaysbe isolated in the same configuration as one of its end nodes.

Hence, if the link was not isolated in the same configuration as, it must be isolated along with node .Before we can isolate the link along with , we must test

(line 6) that will still have an attached non-isolated link, inaccordance to the definition of isolated nodes. If this is not thecase, can not be isolated in the present configuration (line 9).

In the case that the neighbor node was not isolated in anyconfiguration (line 12), we isolate the link along with if thereexists another link not isolated with (line 14). If the link cannot be isolated together with node , we leave it for node toisolate it later. To make sure that this link can be isolated alongwith , we must process next (line 17, selected at line 10 inAlgorithm 1), and link must be the first among the linksoriginating from node to be processed (line 18, selected atline 2).

2) Output: We show that successful execution of Algorithm1 results in a complete set of valid backup configurations.

Proposition 3.3: If Algorithm 1 terminates successfully, theproduced backup configurations adhere to (2) and (3).

Proof: Links are only given weights or in the processof isolating one of its attached nodes, and (3) follows. For re-stricted links, (2) requires that only one of the attached nodesare isolated. This invariant is maintained in line 7 in function

by demanding that if a node attached to a restrictedlink is attempted isolated, the link must also be isolated. Henceit is impossible to isolate two neighbor nodes without also iso-lating their connecting link, and (2) follows.

Proposition 3.4: If Algorithm 1 terminates successfully, thebackup configurations set is complete,and all configurations are valid.

Proof: Initially, all links in all configurations have originallink weights. Each time a new node and its connected links areisolated in a configuration we verify that the backbone in thatconfiguration remains connected. When the links are isolated, itis checked that the node has at least one neighbor not isolatedin (line 14 in Function ). When isolating a node,we also isolate as many as possible of the connected links. Alink is always isolated in the same configuration as one of itsattached nodes. If this is not possible, the node is not isolated( , line 9). From Lemma 3.1, the altered configurationremains valid.

The algorithm runs through all nodes. If one node cannot beisolated, the algorithm aborts (line 22 in Algorithm 1). If it doesterminate with success, all nodes and links are isolated in oneconfiguration, thus the configuration set is complete.

3) Termination: The algorithm runs through all nodes tryingto make them isolated in one of the backup configurations andwill always terminate with or without success. If a node cannotbe isolated in any of the configurations, the algorithm terminateswithout success. However, the algorithm is designed so that anybi-connected topology will result in a successful termination, ifthe number of configurations allowed is sufficiently high.

Proposition 3.5: Given a bi-connected graph ,there will exist , so that Algorithm 1 will terminatesuccessfully.

Proof: Assume . The algorithm will createbackup configurations, isolating one node in each backup con-figuration. In bi-connected topologies this can always be done.

Page 6: Multiple routing configurations for fast

478 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Along with a node , all attached links except one, say ,can be isolated. By forcing node to be the next node pro-cessed ( line 17), and the link to be first among

(line 18), node and link will be isolated in thenext configuration. This can be repeated until we have config-urations so that every node and link is isolated. This holds alsofor the last node processed, since its last link will always leadto a node that is already isolated in another configuration. Sinceall links and nodes can be isolated, the algorithm will terminatesuccessfully.

Ring topologies represent the worst-case input for our algo-rithm since all nodes have two links each and would have tobe isolated in different configurations in order to close the loopdescribed in Prop. 3.5. In bi-connected networks with higherconnectivity it is often possible to reuse the configurations andterminate the algorithm with a lower . In Section V we analyzethe number of backup configurations created by Algorithm 1 fordifferent input network topologies.

4) Complexity: The complexity of the proposed algorithm isdetermined by the loops and the complexity of themethod. This method performs a procedure similar to deter-mining whether a node is an articulation point in a graph, boundto worst case . Additionally, for each node, we runthrough all adjacent links, whose number has an upper boundin the maximum node degree . In the worst case, we must runthrough all configurations to find a configuration where a nodecan be isolated. The worst case running time for the completealgorithm is then bound by . The running time ona standard desktop computer varies from sub-second for smalltopologies to a few seconds for topologies with several hundrednodes.

IV. LOCAL FORWARDING PROCESS

Given a sufficiently high , the algorithm presented inSection III will create a complete set of valid backup configu-rations. Based on these, a standard shortest path algorithm isused in each configuration to calculate configuration specificforwarding tables. In this section, we describe how these for-warding tables are used to avoid a failed component.

When a packet reaches a point of failure, the node adjacent tothe failure, called the detecting node, is responsible for findinga backup configuration where the failed component is isolated.The detecting node marks the packet as belonging to this config-uration, and forwards the packet. From the packet marking, alltransit routers identify the packet with the selected backup con-figuration, and forward it to the egress node avoiding the failedcomponent.

Consider a situation where a packet arrives at node , andcannot be forwarded to its normal next-hop because of a com-ponent failure. The detecting node must find the correct backupconfiguration without knowing the root cause of failure, i.e.,whether the next-hop node or link has failed, since thisinformation is generally unavailable.

Let denote the backup configuration where node isisolated, i.e., . Similarly, let de-note the backup configuration where the link is isolated,i.e., . Assuming that nodeis the egress (or the destination) in the local network domain,

Fig. 2. Packet forwarding state diagram.

we can distinguish between two cases. If , forwardingcan be done in configuration , where both and willbe avoided. In the other case, , the challenge is to pro-vide recovery for the failure of link when node is oper-ative. Our strategy is to forward the packet using a path to thatdoes not contain . Furthermore, packets that have changedconfiguration before (their configuration ID is different than theone used in ), and still meet a failed component on their for-warding path, must be discarded. This way packets loops areavoided, also in the case that node indeed has failed. The stepsthat are taken in the forwarding process by the detecting nodeare summarized in Fig. 2.

Assume there is only a single component failure in the net-work, detected by node on path to the network-local destina-tion via node .

Proposition 4.1: Node selects configuration so that, if .

Proof: Node selects in step 2. Node is isolated inand will not be in the shortest path according to

proposition 3.2.Proposition 4.2: Node selects configuration so that

.Proof: If , node selects in step 2, and neither

node nor link will be in the shortest path .Assume that is the egress node for destination . Remember

that according to (3), . Wedistinguish between three possible cases, illustrated in Fig. 3.

If and as in Fig. 3(a), thenaccording to the definition of an isolated node and (2). For-

warding step 2 will select and does notcontain .

If , , , andas in Fig. 3(b), forwarding step 2 will select and

does not contain .

Page 7: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 479

Fig. 3. When there is an error in the last hop � � �, a packet must be for-warded in the configuration where the connecting link is isolated. The figureshows isolated nodes (shaded color), restricted links (dashed), and isolated links(dotted). In cases (a) and (b), ���� �� � ����, and the forwarding will be donein����. In case (c),���� �� �� ����, and the forwarding will be done in����.

Finally, if , , , andas in Fig. 3(c), forwarding step 2 will select . Link

is not isolated in , and will be returned as the next hop.Step 3 will detect this, and step 4 will select and

does not contain .

A. Implementation Issues

The forwarding process can be implemented in the routingequipment as presented above, requiring the detecting node toknow the backup configuration for each of its neighbors.Node would then perform at most two additional next-hoplook-ups in the case of a failure. However, all nodes in the net-work have full knowledge of the structure of all backup con-figurations. Hence, node can determine in advance the correctbackup configuration to use if the normal next hop for a destina-tion has failed. This way the forwarding decision at the pointof failure can be simplified at the cost of storing the identifier ofthe correct backup configuration to use for each destination andfailing neighbor.

For the routers to make a correct forwarding decision, eachpacket must carry information about which configuration it be-longs to. This information can be either explicit or implicit. Anexplicit approach could be to use a distinct value in the DSCPfield of the IP header to identify the configuration. As we willsee shortly, a very limited number of backup configurations areneeded to guarantee recovery from all single link or node fail-ures, and hence the number of needed values would be small.A more implicit approach would be to assign a distinct local IPaddress space for each backup configuration. Each node in theIGP cloud would get a separate address in each configuration.The detecting node could then encapsulate recovered packetsand tunnel them shortest path in the selected backup configu-ration to the egress node. The packets would then be decapsu-lated at the egress and forwarded from there as normal towardsthe final destination. The drawback with this method is the addi-tional processing and bandwidth resource usage associated withtunneling.

Recent IETF standardization work on Multi Topology routingmechanisms [14], [15] provides a useful framework for MRCimplementation. These IETF drafts specify how routers can ex-change information about the link weights used in several log-ical topologies, and build topology specific forwarding tables.Use of these drafts for providing proactive recovery is sketchedin [16].

V. PERFORMANCE EVALUATION

MRC requires the routers to store additional routing config-urations. The amount of state required in the routers is relatedto the number of such backup configurations. Since routing ina backup configuration is restricted, MRC will potentially givebackup paths that are longer than the optimal paths. Longerbackup paths will affect the total network load and also theend-to-end delay.

Full, global IGP re-convergence determines shortest paths inthe network without the failed component. We use its perfor-mance as a reference point and evaluate how closely MRC canapproach it. It must be noted that MRC yields the shown perfor-mance immediately after a failure, while IP re-convergence cantake seconds to complete.

A. Evaluation Setup

We have implemented the algorithm described inSection III-B and created configurations for a wide rangeof bi-connected synthetic and real topologies.2 The synthetictopologies are obtained from the BRITE topology generationtool [17] using the Waxman [18] and the Generalized LinearPreference (GLP) [19] models. The number of nodes is variedbetween 16 and 512 to demonstrate the scalability. To explorethe effect of network density, the average node degree is 4 or6 for Waxman topologies and 3.6 for GLP topologies. For allsynthetic topologies, the links are given unit weight. The realtopologies are taken from the Rocketfuel topology database[20].

For each topology, we measure the minimum number ofbackup configurations needed by our algorithm to isolate everynode and link in the network. Recall from Section III-B thatour algorithm for creating backup configurations only takesthe network topology as input, and is not influenced by thelink weights. Hence, the number of configurations neededis valid irrespective of the link weight settings used. For theRocketfuel topologies, we also measure the number of config-urations needed if we exclude the nodes that can be covered byLoop-Free Alternates (LFA) [21]. LFA is a cheaper fast reroutetechnique that exploits the fact that for many destinations, thereexists an alternate next-hop that will not lead to a forwardingloop. If such alternate paths exist for all traffic that is routedthrough a node, we can rely on LFA instead of protecting thenode using MRC.

Based on the created configurations, we measure the backuppath lengths (hop count) achieved by our scheme after a nodefailure. For a selected class of topologies, we evaluate howthe backup path lengths depend on the number of backupconfigurations.

The shifting of traffic from the normal path to a recoverypath changes the load distribution in the network, and can insome cases lead to congestion and packet loss. We thereforetest the effect our scheme has on the load distribution after afailure. To do this, we have performed simulations of the Euro-pean COST239 network [22] shown in Fig. 4, designed to con-nect major cities across Europe. All links in the network have

2Our simulation software is available at http://www.simula.no/research/net-works/software/.

Page 8: Multiple routing configurations for fast

480 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Fig. 4. The COST239 network.

equal capacity. To achieve a good load distribution and mini-mize the chances of congestion in the failure-free case, we adoptthe link weight optimization heuristic introduced in [23]. Theydefine a piecewise linear cost function that is dependent onthe load on each of the links in the network. is convexand resembles an exponentially growing function. They then in-troduce a local search heuristic that tries to minimize the valueof by randomly perturbing the link weights. This local searchheuristic has been shown to give performance that is close to theoptimal solution that can be achieved by a connection orientedtechnology like MPLS.

The COST239 network is selected for this evaluation be-cause of its resilient network topology. By using this network,we avoid a situation where there exists only one possiblebackup path to a node. The differences with respect to linkloads between different recovery strategies will only be visiblewhen there exists more than one possible backup path. In theCOST239 network each node has a node degree of at least four,providing the necessary maneuvering space.

For our load evaluations, we use a gravity-style traffic ma-trix where the traffic between two destinations is based on thepopulation of the countries they represent [22]. For simplicity,we look at constant packet streams between each node pair. Thetraffic matrix has been scaled so that the load on the most uti-lized link in the network is about 2/3 of the capacity. We useshortest path routing with equal splitting of traffic if there existsseveral equal cost paths towards a destination.

B. Number of Backup Configurations

Fig. 5 shows the minimum number of backup configurationsthat Algorithm 1 could produce in a wide range of synthetictopologies. Each bar in the figure represents 100 differenttopologies given by the type of generation model used, thelinks-to-node ratio, and the number of nodes in the topology.Table II shows the minimum number of configurations Al-gorithm 1 could produce for selected real-world topologiesof varying size. For the Sprint US network, we show resultsfor both the POP-level and router level topologies. The tablealso shows how many nodes that are covered by LFAs, andthe number of configurations needed when MRC is usedin combination with LFAs. Since some nodes and links arecompletely covered by LFAs, MRC needs to isolate fewercomponents, and hence the number of configurations decreases

Fig. 5. The number of backup configurations required for a wide range ofBRITE generated topologies. As an example the bar name wax-2-16 denotesthat the Waxman model is used with a links-to-node ratio of 2, and with 16nodes.

TABLE IINUMBER OF BACKUP CONFIGURATIONS FOR

SELECTED REAL-WORLD NETWORKS

for some topologies. We see that for the COST239 network, allnodes except one is covered by LFAs. However, we still needtwo backup configurations to cover this single node, becauseisolating all the attached links in a single configuration wouldleave the node unreachable.

The results show that the number of backup configurationsneeded is usually modest; 3 or 4 is typically enough to isolateevery element in a topology. No topology required more thansix configurations. In other words, Algorithm 1 performs verywell even in large topologies. The algorithm fails only if it meetsa node that if isolated disconnects the backbone in each of the

backup configurations. The algorithm often goes through allnetwork nodes without meeting this situation even if is low,and is more successful in topologies with a higher average nodedegree. The running time of our algorithm is modest; about 5seconds for the router level Sprint US network.

In Section III-B we stated the problem of finding a minimalcomplete set of valid configurations can be transformed to theSet Covering problem. It has long been known that heuristicalgorithms can efficiently approximate an optimal solutionto this problem [24], which makes the good performance ofAlgorithm 1 less surprising.

This modest number of backup configurations shows that ourmethod is implementable without requiring a prohibitively highamount of state information.

C. Backup Path Lengths

Fig. 6 shows path length distribution of the recovery pathsafter a node failure. The numbers are based on 100 differentsynthetic Waxman topologies with 32 nodes and 64 links. Allthe topologies have unit weight links, in order to focus more

Page 9: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 481

Fig. 6. Backup path lengths in the case of a node failure.

Fig. 7. Average backup path lengths in the case of a node failure as a functionof the number of backup configurations.

on the topological characteristics than on a specific link weightconfiguration. Results for link failures show the same tendencyand are not presented.

For reference, we show the path length distribution in thefailure-free case (“IGP normal”), for all paths with at least twohops. For each of these paths, we let every intermediate nodefail, and measure the resulting recovery path lengths usingglobal IGP rerouting, local rerouting based on the full topologyexcept the failed component (“Optimal local”), as well as MRCwith 5 backup configurations.

We see that MRC gives backup path lengths close to thoseachieved after a full IGP re-convergence. This means that theaffected traffic will not suffer from unacceptably long backuppaths in the period when it is forwarded according to an MRCbackup configuration.

Algorithm 1 yields richer backup configurations as theirnumber increases. In Fig. 7 we have plotted the average backuppath lengths for the 75 of the 100 input topologies that could becovered using 3 backup configurations. The figure shows thatthe average recovery path length decreases as the number ofbackup configurations increases.

D. Load on Individual Links

In order to evaluate the routing performance while MRC isused to recover traffic, we measure the throughput on each uni-directional link for every possible link failure. We then find themaximum link utilization over all failures for each link. Fivebackup configurations were used.

Fig. 8. Load on all unidirectional links in the failure free case, after IGP re-con-vergence, and when MRC is used to recover traffic. Shows each individual linksworst case scenario.

Fig. 8 shows the maximum load on all links, which are in-dexed from the least loaded to the most loaded in the failure-freecase. The results indicate that the restricted routing in the backuptopologies result in a worst case load distribution that is compa-rable to what is achieved after a complete IGP rerouting process.

However, we see that for some link failures, MRC gives asomewhat higher maximum link utilization in this network. Themaximum link load after the worst case link failure is 118% withMRC, compared to 103% after a full IGP re-convergence. In thenext section, we discuss a method for improving the post failureload balancing with MRC.

VI. RECOVERY LOAD DISTRIBUTION

MRC recovery is local, and the recovered traffic is routed ina backup configuration from the point of failure to the egressnode. This shifting of traffic from the original path to a backuppath affects the load distribution in the network, and might leadto congestion. In our experience, the effect a failure has on theload distribution when MRC is used is highly variable. Occa-sionally the load added on a link can be significant, as we saw inFig. 8. In this section, we describe an approach for minimizingthe impact of the MRC recovery process on the post failure loaddistribution.

If MRC is used for fast recovery, the load distribution in thenetwork during the failure depends on three factors:

(a) The link weight assignment used in the normal configu-ration ,

(b) The structure of the backup configurations, i.e., whichlinks and nodes are isolated in each ,

(c) The link weight assignments used in the backbonesof the backup configurations.

The link weights in the normal configuration (a) are impor-tant since MRC uses backup configurations only for the trafficaffected by the failure, and all non-affected traffic is distributedaccording to them. The backup configuration structure (b) dic-tates which links can be used used in the recovery paths for eachfailure. The backup configuration link weight assignments (c)determine which among the available backup paths are actuallyused.

Page 10: Multiple routing configurations for fast

482 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Network operators often plan and configure their networkbased on an estimate of the traffic demands from each ingressnode to each egress node. Clearly, the knowledge of such de-mand matrix provides the opportunity to construct the backupconfigurations in a way that gives better load balancing andavoids congestion after a failure. We propose a procedure todo this by constructing a complete set of valid configurationsin three phases. First, the link weights in the normal configura-tion are optimized for the given demand matrix while onlytaking the failure free situation into account. Second, we takeadvantage of the load distribution in the failure free case to con-struct the MRC backup configurations in an intelligent manner.Finally, we optimize the link weights in the backbones of thebackup configurations to get a good load distribution after anylink failure.

Since the optimization procedure described in this sectionuses random search and relies on an estimate of the traffic ma-trix, it can only be implemented by a central network manage-ment unit. The link weight optimization is a computing intensiveprocess, and should be conducted as a background refinementprocess.

The improvement of this proposal compared to the MRCbackup configuration generation described in Section III is thecongestion avoidance during MRC recovery. The functionalitywith respect to single link or node failure recovery guaranteesis unchanged. Since the backup configurations are only used toroute recovered traffic, we can optimize the structure and linkweights used in these configurations for the different failurescenarios without sacrificing performance in the failure freecase. We restrict ourselves to link failures in the optimizationphase in order to limit the calculation complexity. A moredetailed description of this procedure can be found in [25].

A. Algorithm

We first optimize the link weights in the normal configuration, so that the cost of routing the demands is minimized.

For this purpose we use the link weight optimization heuristicand the cost function introduced in [23], as described in theevaluation method in Section V.

Using the optimized link weights in , we calculate the loadon each unidirectional link in the network in the failure-free

case. In Algorithm 3 this information is used to construct backupconfigurations . The intuition behind our algorithmis that we want the amount of traffic that is potentially recov-ered in each backup configuration to be approximately equal.We want to avoid that the failure of heavily loaded links resultsin large amounts of traffic being recovered in backup configu-rations with a sparse backbone. Instead, this traffic should berouted in a rich (well connected) backbone, where we have abetter chance of distributing it over less loaded links by settingappropriate link weights.

To implement this load-aware algorithm we calculate the po-tential of each node in the network and the potential of eachbackup configuration:

Definition: The potential of a node is the sum of theload on all its incoming and outgoing links:

(11)

Definition: The potential of a backup configuration isthe sum of the potential of all nodes that are isolated in :

(12)

Our modified backup configuration construction method isdefined in Algorithm 3. As in Algorithm 1, the input to ouralgorithm for generating backup configurations is the normalconfiguration , and the number of backup configurationswe want to create. We start our configuration generation algo-rithm by ordering all nodes with respect to their potential andassigning each node to a tentative backup configuration(line 6 in Algorithm 3), so that the potential of each backupconfiguration is approximately equal:

(13)

The nodes with the smallest potential are assigned to , thosewith somewhat higher potential to , and so on with the nodeswith the highest potential in .

We then go through all nodes in the network, and attempt toisolate each node in its tentative backup configuration (line10). For some nodes, this might not be possible without breakingthe definition of a connected backbone given in (5). This nodeis then attempted isolated in backup configuration ,and so on (line 20), until all backup configurations are tried. Ifa node can not be isolated in any of the backup configurations,we give up and abort. Note that when nodes can not be isolatedin the backup configuration it was assigned to, this will disturbthe desired property of equalizing among the backup configu-rations. However, in our experience this typically only happensfor a very limited number of nodes, and the consequences arenot severe.

Page 11: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 483

The outcome of this algorithm is dependent on the networktopology and the traffic demand matrix . If the load is closeto equally distributed on the links before a failure, we end upwith approximately the same number of nodes isolated in eachbackup configuration. If the traffic distribution is more skewed,the algorithm typically ends up with isolating many nodes witha small potential in , while only very few nodes, with a highpotential, are isolated in backup configuration . This is inaccordance with the goal of having a rich backbone in whichto reroute traffic after the failure of heavily loaded links.

B. Weight Optimization

When the backup configurations are created, ourthird phase consists of optimizing the link weights used in theseconfigurations so that the recovered traffic is distributed overless utilized links. We use a weight search heuristic similar tothe one used in the failure free case, and we adopt the cost func-tion introduced in [23]. Our goal is to find a weight function

for each configuration, so that the cost of routing the de-mands through the network is as small as possible after any linkfailure. However, evaluating the for a given weight setting isa complex task because it involves recalculating the load on alllinks in the network. In order to get a method that scales to largenetworks, we identify a limited set of critical link failures. Wethen optimize the link weights taking only the critical link fail-ures into account.

Let denote the cost of routing the demands through thenetwork when link has failed. We define the critical link set

as the links that give the highest value of upon failure,i.e., is the set of links with cardinality so that

. The setting of can be used to balance the re-sult precision with the computation complexity. Note that theinitial calculation of is performed after we have optimized

, but before we have optimized . To capture pos-sible changes caused by the altered link weights in the backupconfigurations, the set of critical link failures is periodically up-dated during the link weight search.

With MRC, there is a dependency between a particular linkfailure and the backup configurations that are used to recover thetraffic, as explained in Section IV. This means that for the failureof a particular link , the distribution of recovered traffic isdetermined only by the link weights in the backup configura-tions where nodes and are isolated. We take advantage ofthis to further reduce the number of cost evaluations performedin our local search heuristic. For each backup configuration ,we define as the set of critical links whose failure re-sults in recovered traffic being routed according to :

Definition: The set of critical links of a configurationis

(14)

Given the set of critical link failures for each backup config-uration, we run the local search heuristic with the objective ofminimizing the sum . Each time we change a linkweight in backup configuration , we only need to eval-uate the resulting cost after the failure of each link in .For all other link failures, this cost will be the same.

Fig. 9. Load on all unidirectional links in the COST239 network after the worstcase link failure. Top: Optimized MRC versus complete IGP rerouting. Bottom:Standard versus optimized MRC.

The local search heuristic terminates and returns the weightsetting that gives the lowest value of the objective function aftera given number of iterations.

C. Evaluation

To evaluate our load aware construction algorithm, we com-pute the worst case load on each link after a link failure, andcompare it to the results achieved by the original algorithm. Wereuse the evaluation framework from Section V, and set the crit-ical link set size to 20.

In the top panel in Fig. 9, we show the worst case link loadsfor the load aware MRC (“Optimized MRC”) and after a fullIGP re-convergence on the new topology. The links are sortedby the load in the failure-free case. The top panel in Fig. 9 isdirectly comparable to Fig. 8. We see that the worst case loadpeaks for the optimized MRC are somewhat reduced comparedto the standard MRC. The maximum link load after the worstcase link failure has been reduced from 118% to 91%, whichis better than what is achieved after a full IGP re-convergence.This is possible since the re-converged network will choose theshortest available path, while MRC in this case manages to routethe recovered traffic over less utilized links.

The effect of the proposed recovery load balancing is high-lighted in the bottom panel of Fig. 9, where the optimized andstandard MRC are directly compared. Here, the links are sortedby their load after a worst case failure using standard MRC. We

Page 12: Multiple routing configurations for fast

484 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

TABLE IIICONCEPTUAL COMPARISON OF DIFFERENT APPROACHES FOR FAST IP RECOVERY

see how the optimized MRC often manages to route traffic overless utilized links after the failure of a heavily loaded link.

Note that the optimizations described here will only havean effect if the network topology allows more than one pos-sible backup path after a failure. We have also run our opti-mizations on less connected networks than COST239, withoutachieving any significant improvements over the method de-scribed in Section III.

For small networks like COST239, our link weight optimiza-tion is performed in seconds or minutes. For larger networks theoptimizations can take several hours, and should be conductedas a background refinement process. Note that updating the linkweights in the backup configurations can be done without con-sequences for the traffic, since no traffic is routed there duringnormal operation.

VII. RELATED WORK

Much work has lately been done to improve robustnessagainst component failures in IP networks [10]. In this section,we focus on the most important contributions aimed at restoringconnectivity without a global re-convergence. Table III summa-rizes important features of the different approaches. We indicatewhether each mechanism guarantees one-fault tolerance in anarbitrary bi-connected network, for both link and node failures,independent of the root cause of failure (failure agnostic). Wealso indicate whether they solve the “last hop problem”.

Network layer recovery in the timescale of milliseconds hastraditionally only been available for networks using MPLS withits fast reroute extensions [26]. In the discussion below, wefocus mainly on solutions for connectionless destination-basedIP routing. A related survey can be found in [27].

IETF has recently drafted a framework called IP fast reroute[30] where they point at Loop-Free Alternates (LFAs) [21] asa technique to partly solve IP fast reroute. From a node de-tecting a failure, a next hop is defined as an LFA if this nexthop will not loop the packets back to the detecting node or tothe failure. Since LFAs do not provide full coverage, IETF isalso drafting a tunneling approach based on so called “Not-via”addresses to guarantee recovery from all single link and nodefailures [11]. Not-via is the connectionless version of MPLSfast reroute [26] where packets are detoured around the failureto the next-next hop. To protect against the failure of a com-ponent P, a special Not-via address is created for this compo-nent at each of P’s neighbors. Forwarding tables are then cal-culated for these addresses without using the protected compo-nent. This way, all nodes get a path to each of P’s neighbors,

without passing through (“Not-via”) P. The Not-via approachis similar to MRC in that loop-free backup next-hops are foundby doing shortest path calculations on a subset of the network. Italso covers against link and node failures using the same mecha-nism, and is strictly pre-configured. However, the tunneling ap-proach may give less optimal backup paths, and less flexibilitywith regards to post failure load balancing.

Narvaez et al. [28] propose a method relying on multi-hop re-pair paths. They propose to do a local re-convergence upon de-tection of a failure, i.e., notify and send updates only to the nodesnecessary to avoid loops. A similar approach also consideringdynamic traffic engineering is presented in [31]. We call theseapproaches local rerouting. They are designed only for link fail-ures, and therefore avoid the problems of root cause of failureand the last hop. Their method does not guarantee one-fault-tol-erance in arbitrary bi-connected networks. It is obviously con-nectionless. However, it is not strictly pre-configured, and canhence not recover traffic in the same short time-scale as a strictlypre-configured scheme.

Nelakuditi et al. [8] propose using interface specific for-warding to provide loop-free backup next hops to recover fromlink failures. Their approach is called failure insensitive routing(FIR). The idea behind FIR is to let a router infer link failuresbased on the interface packets are coming from. When a linkfails, the attached nodes locally reroute packets to the affecteddestinations, while all other nodes forward packets according totheir pre-computed interface specific forwarding tables withoutbeing explicitly aware of the failure. In another paper, they havealso proposed a similar method, named Failure Inferencingbased Fast Rerouting (FIFR), for handling node failures [29].This method will also cover link failures, and hence it operatesindependent of the root cause of failure. However, their methodwill not guarantee this for the last hop, i.e., they do not solvethe “last hop problem”. FIFR guarantees one-fault-tolerance inany bi-connected network, it is connectionless, pre-configuredand it does not affect the original failure-free routing.

Our main inspiration for using multiple routing functionsto achieve failure recovery has been a layer-based approachused to obtain deadlock-free and fault-tolerant routing inirregular cluster networks based on a routing strategy called

[32]. General packet networks are not hamperedby deadlock considerations necessary in interconnection net-works, and hence we generalized the concept in a technologyindependent manner and named it Resilient Routing Layers[33], [34]. In the graph-theoretical context, RRL is based oncalculating spanning sub topologies of the network, calledlayers. Each layer contains all nodes but only a subset of the

Page 13: Multiple routing configurations for fast

KVALBEIN et al.: MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY 485

links in the network. In this paper we refine these ideas andadapt them to an IP setting.

None of the proactive recovery mechanisms discussed abovetake any measures towards a good load distribution in the net-work in the period when traffic is routed on the recovery paths.Existing work on load distribution in connectionless IGP net-works has either focused on the failure free case [23], [35], [36],or on finding link weights that work well both in the normal caseand when the routing protocol has converged after a single linkfailure [37]–[39].

Many of the approaches listed provide elegant and efficientsolutions to fast network recovery, however MRC and Not-viatunneling seems to be the only two covering all evaluated re-quirements. However, we argue that MRC offers the same func-tionality with a simpler and more intuitive approach, and leavesmore room for optimization with respect to load balancing.

VIII. CONCLUSION

We have presented Multiple Routing Configurations as an ap-proach to achieve fast recovery in IP networks. MRC is based onproviding the routers with additional routing configurations, al-lowing them to forward packets along routes that avoid a failedcomponent. MRC guarantees recovery from any single node orlink failure in an arbitrary bi-connected network. By calculatingbackup configurations in advance, and operating based on lo-cally available information only, MRC can act promptly afterfailure discovery.

MRC operates without knowing the root cause of failure, i.e.,whether the forwarding disruption is caused by a node or linkfailure. This is achieved by using careful link weight assign-ment according to the rules we have described. The link weightassignment rules also provide basis for the specification of a for-warding procedure that successfully solves the last hop problem.

The performance of the algorithm and the forwarding mech-anism has been evaluated using simulations. We have shownthat MRC scales well: 3 or 4 backup configurations is typi-cally enough to isolate all links and nodes in our test topologies.MRC backup path lengths are comparable to the optimal backuppath lengths—MRC backup paths are typically zero to two hopslonger. We have evaluated the effect MRC has on the load distri-bution in the network while traffic is routed in the backup config-urations, and we have proposed a method that minimizes the riskof congestion after a link failure if we have an estimate of thedemand matrix. In the COST239 network, this approach gavea maximum link load after the worst case link failure that waseven lower than after a full IGP re-convergence on the alteredtopology. MRC thus achieves fast recovery with a very limitedperformance penalty.

REFERENCES

[1] D. D. Clark, “The design philosophy of the DARPA internet protocols,”ACM SIGCOMM Comput. Commun. Rev., vol. 18, no. 4, pp. 106–114,Aug. 1988.

[2] A. Basu and J. G. Riecke, “Stability issues in OSPF routing,” in Proc.ACM SIGCOMM, San Diego, CA, Aug. 2001, pp. 225–236.

[3] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian, “Delayed internetrouting convergence,” IEEE/ACM Trans. Networking, vol. 9, no. 3, pp.293–306, Jun. 2001.

[4] C. Boutremans, G. Iannaccone, and C. Diot, “Impact of link failures onVoIP performance,” in Proc. Int. Workshop on Network and OperatingSystem Support for Digital Audio and Video, 2002, pp. 63–71.

[5] D. Watson, F. Jahanian, and C. Labovitz, “Experiences with monitoringOSPF on a regional service provider network,” in Proc. 23rd Int. Conf.Distributed Computing Systems (ICDCS’03), Washington, DC, 2003,pp. 204–213, IEEE Computer Society.

[6] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure, “Achievingsub-second IGP convergence in large IP networks,” ACM SIGCOMMComput. Commun. Rev., vol. 35, no. 2, pp. 35–44, Jul. 2005.

[7] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, andC. Diot, “Characterization of failures in an IP backbone network,” inProc. IEEE INFOCOM, Mar. 2004, vol. 4, pp. 2307–2317.

[8] S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, and C.-N. Chuah, “Fast localrerouting for handling transient link failures,” IEEE/ACM Trans. Net-working, vol. 15, no. 2, pp. 359–372, Apr. 2007.

[9] S. Iyer, S. Bhattacharyya, N. Taft, and C. Diot, “An approach to al-leviate link overload as observed on an IP backbone,” in Proc. IEEEINFOCOM, Mar. 2003, pp. 406–416.

[10] S. Rai, B. Mukherjee, and O. Deshpande, “IP resilience within an au-tonomous system: Current approaches, challenges, and future direc-tions,” IEEE Commun. Mag., vol. 43, no. 10, pp. 142–149, Oct. 2005.

[11] S. Bryant, M. Shand, and S. Previdi, “IP fast reroute using not-viaaddresses,” Internet Draft (work in progress), draft-ietf-rtgwg-ipfrr-notvia-addresses-01, Jun. 2007.

[12] P. Francois, M. Shand, and O. Bonaventure, “Disruption free topologyreconfiguration in OSPF networks,” in Proc. IEEE INFOCOM, An-chorage, AK, May 2007, pp. 89–97.

[13] M. R. Garey and D. S. Johnson, Computers and Intractability: AGuide to the Theory of NP-Completeness. San Francisco, CA: W. H.Freeman & Co., 1979.

[14] P. Psenak, S. Mirtorabi, A. Roy, L. Nguen, and P. Pillay-Esnault, “MT-OSPF: Multi Topology (MT) routing in OSPF,” IETF Internet Draft(work in progress), draft-ietf-ospf-mt-07.txt, Nov. 2006.

[15] T. Przygienda, N. Shen, and N. Sheth, “M-ISIS: Multi Topology (MT)routing in IS-IS,” Internet Draft (work in progress), draft-ietf-isis-wg-multi-topology-11.txt, Oct. 2005.

[16] M. Menth and R. Martin, “Network resilience through multi-topologyrouting,” in Proc. 5th Int. Workshop on Design of Reliable Communi-cation Networks (DRCN), Oct. 2005, pp. 271–277.

[17] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An approach touniversal topology generation,” in Proc. IEEE MASCOTS, Aug. 2001,pp. 346–353.

[18] B. M. Waxman, “Routing of multipoint connections,” IEEE J. Sel.Areas Commun., vol. 6, no. 9, pp. 1617–1622, Dec. 1988.

[19] T. Bu and D. Towsley, “On distinguishing between internet power lawtopology generators,” in Proc. IEEE INFOCOM, New York, Jun. 2002,pp. 638–647.

[20] Rocketfuel Topology Mapping. [Online]. Available: http://www.cs.washington.edu

[21] A. Atlas and A. Zinin, “Basic specification for IP fast-reroute: loop-freealternates,” IETF Internet Draft (work in progress), draft-ietf-rtgwg-ipfrr-spec-base-06, Mar. 2007.

[22] M. J. O’Mahony, “Results from the COST 239 project. Ultra-high ca-pacity optical transmission networks,” in Proc. 22nd European Conf.Optical Communication (ECOC’96), Sep. 1996, pp. 11–14.

[23] B. Fortz and M. Thorup, “Internet traffic engineering by optimizingOSPF weights,” in Proc. IEEE INFOCOM, 2000, pp. 519–528.

[24] D. S. Johnson, “Approximation algorithms for combinatorial prob-lems,” in Proc. 5th Annu. ACM Symp. Theory of Computing, 1973, pp.38–49.

[25] A. Kvalbein, T. Cicic, and S. Gjessing, “Post-failure routing per-formance with multiple routing configurations,” in Proc. IEEEINFOCOM, May 2007, pp. 98–106.

[26] P. Pan, G. Swallow, and A. Atlas, “Fast reroute extensions to RSVP-TEfor LSP tunnels,” RFC 4090, May 2005.

[27] A. Raja and O. C. Ibe, “A survey of IP and multiprotocol labelswitching fast reroute schemes,” Comput. Netw., vol. 51, no. 8, pp.1882–1907, Jun. 2007.

[28] P. Narvaez, K.-Y. Siu, and H.-Y. Tzeng, “Local restoration algorithmsfor link-state routing protocols,” in Proc. IEEE Int. Conf. ComputerCommunications and Networks (ICCCN’99), Oct. 1999, pp. 352–357.

[29] Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C.-N. Chuah,“Failure inferencing based fast rerouting for handling transient linkand node failures,” in Proc. IEEE INFOCOM, Mar. 2005, vol. 4, pp.2859–2863.

[30] M. Shand and S. Bryant, “IP fast reroute framework,” IETF InternetDraft (work in progress), draft-ietf-rtgwg-ipfrr-framework-07, Jun.2007.

[31] R. Rabbat and K.-Y. Siu, “Restoration methods for traffic engineerednetworks for loop-free routing guarantees,” in Proc. IEEE Int. Conf.Communications (ICC’01), Helsinki, Finland, Jun. 2001, vol. 5, pp.1566–1570.

Page 14: Multiple routing configurations for fast

486 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

[32] I. Theiss and O. Lysne, “FRoots, a fault tolerant and topology agnosticrouting technique,” IEEE Trans. Parallel Distrib. Syst., vol. 17, no. 10,pp. 1136–1150, Oct. 2006.

[33] A. F. Hansen, T. Cicic, S. Gjessing, A. Kvalbein, and O. Lysne, “Re-silient routing layers for recovery in packet networks,” in Proc. Int.Conf. Dependable Systems and Networks (DSN 2005), Jun. 2005, pp.238–247.

[34] A. Kvalbein, A. F. Hansen, T. Cicic, S. Gjessing, and O. Lysne, “Fastrecovery from link failures using resilient routing layers,” in Proc. 10thIEEE Symp. Computers and Communications (ISCC 2005), Jun. 2005,pp. 554–560.

[35] Y. Wang, Z. Wang, and L. Zhang, “Internet traffic engineering withoutfull mesh overlaying,” in Proc. IEEE INFOCOM, Apr. 2001, pp.565–571.

[36] A. Sridharan, R. Guerin, and C. Diot, “Achieving near-optimal trafficengineering solutions for current OSPF/IS-IS networks,” IEEE/ACMTrans. Networking, vol. 13, no. 2, pp. 234–247, Apr. 2005.

[37] A. Nucci, B. Schroeder, S. Bhattacharyya, N. Taft, and C. Diot, “IGPlink weight assignment for transient link failures,” in Proc. 18th Int.Teletraffic Congress, Berlin, Germany, Aug. 2003.

[38] B. Fortz and M. Thorup, “Robust optimization of OSPF/IS-IS weights,”in Proc. INOC, Oct. 2003, pp. 225–230.

[39] A. Sridharan and R. Guerin, “Making IGP routing robust to link fail-ures,” in Proc. Networking, Waterloo, Canada, 2005.

Amund Kvalbein (M’05) received the M.Sc.and Ph.D. degrees from the University of Oslo,Oslo, Norway, in 2003 and 2007, respectively. Themain focus of his Ph.D. thesis was fast recoverymechanisms.

He is now a Postdoctoral Fellow at Simula Re-search Laboratory, Lysaker, Norway, where he isleading a project on network resilience. His mainresearch interests are network layer protocols, inparticular issues related to fault tolerance, scalabilityand robustness under unexpected operational

environments.

Audun Fosselie Hansen received the M.Sc. andPh.D. degrees from the University of Oslo, Oslo,Norway, in 2001 and 2007, respectively. His thesisfocused on fast reroute in IP networks.

He is currently working for Telenor R&I andSimula Research Laboratory as a Project Managerfor the joint SimTel project. The focus of thisproject is increased service availability and qualityto heterogeneous mobile terminals in heterogeneousenvironments.

Tarik Cicic (M’00) received the M.S. and Ph.D. de-grees from the University of Oslo, Oslo, Norway, in1997 and 2002, respectively.

He was a Postdoctoral Fellow at Simula ResearchLaboratory, Norway, from 2003 to 2007. Currentlyhe is the CTO of Media Network Services, Norway,and an Associate Professor at the Department of In-formatics, University of Oslo. His research interestsinclude network architectures, communication proto-cols modeling, network measurements, and networkresilience.

Stein Gjessing (M’91) received the Dr. Philos. de-gree in 1985.

He is currently a Professor of computer science inthe Department of Informatics, University of Oslo,Oslo, Norway, and an adjunct researcher at SimulaResearch Laboratory, Norway. His original work wasin the field of object-oriented concurrent program-ming. He has worked with computer interconnects[Scalable Coherent Interface (IEEE Std. 1596) andLAN/MANs (Resilient Packet Ring, IEEE Std. 802.17)]. His main research interests are currently within

network resilience, including sensor networks, Internet-like networks and op-tical networks.

Olav Lysne (M’92) received the Masters and the Dr.Scient. degrees from the University of Oslo, Oslo,Norway, in 1988 and 1992, respectively.

He is now Research Director at Simula ResearchLaboratory, and a Professor in computer science atthe University of Oslo. His early research contribu-tions were in the field of algebraic specification andterm rewriting. However, his most important scien-tific contributions are in interconnection networks,focusing on problems like effective routing, faulttolerance and quality of service. In this area, he has

participated in numerous program committees, and he served as general chairof ICPP 2005.