corrs: consistent and responsive routing with …xwy/publications/929-nsdi09-draft...corrs:...

14
CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University [email protected] Xiaowei Yang Dept. of Computer Science Duke University [email protected] David Wetherall Intel Research Seattle & University of Washington [email protected] ABSTRACT The Internet uses a process called routing convergence to quickly find suitable forwarding paths after dynamic network changes such as failures. During routing con- vergence, routers may have inconsistent forwarding state that leads to massive packet loss or forwarding loops. Real-time applications are often sensitive to loss and de- lay, and may produce noticeable performance degrada- tion. Recent work on reducing the adverse effects of routing convergence all slows down or eliminates routing convergence. This paper presents the design and eval- uation of an intra-domain routing system CoRRS that reduces forwarding disruptions during routing conver- gence without scarificing the network’s responsiveness to dynamic changes. CoRRS uses the remaing path cost informatin carried by packets to detect forwarding anoma- lies and to discover valid paths. Our evaluation based on a linux implemention suggests that CoRRS is suit- able for implementation on high-speed routers. It has a small and fixed header overhead, and achieves 95% of the vanilla IP packet forwarding throughput. Our simu- lation comparison shows that CoRRS is more effective and converges 25 times faster than other designs. 1. INTRODUCTION Real-time applications such as VoIP, online gaming, video conferencing, and IPTV desire non-interrupted net- work services [3, 7]. Even sub-second periods of packet loss or delay may adversely impact the users of these applications [7, 12]. Unfortunately, the present Inter- net routing system can easily produce noticeable periods of disruption following a change in the network, e.g., the failure or restoration of equipment, an ISP policy change, or a traffic engineering route adaptation. When a network change occurs (e.g., a link goes down), routers must be informed and recompute their forward- ing tables to adapt to the change, a process known as routing convergence. During convergence, routers adja- cent to failures may not have valid routes, or routers may have inconsistent next hop sequences that form tempo- rary loops called as micro-loops. As a result, packets may be discarded or trapped in a loop. Recent mea- surement shows that BGP convergence may cause 30% packet loss for two minutes or longer [24], and micro- loops formed during IGP convergence in a tier-1 network account for 90% of the total packet loss [18]. Recent work on reducing forwarding disruption dur- ing routing convergence [6, 11, 14, 15, 16, 20, 23, 25, 32, 37] has mainly focused on two general approaches. One approach is to speed up the routing convergence process by improving the underlying routing protocols [6, 15]. The other is to design new convergence techniques to mitigate the adverse effects of routing convergence [14, 16, 17, 20, 25]. Unfortunately, fundamental limits such as speed of light and memory update latency [19, 36, 39] have suggested that the convergence period could not be arbitrarily shortened. Even in an intra-domain environ- ment, recent work suggests that the routing transition time could be shortened to sub-second periods, but no less. This result still falls short to meet the requirements of real-time applications [7]. The second approach aims to ensure a consistent for- warding state among routers to avoid loops and black- holes, but at the cost of reducing the responsiveness of the routing system. One example is consensus routing [20], in which routers first agree upon the set of routing up- dates that are safe to apply by taking periodic global snapshots and running a consensus algorithm before they update their forwarding tables. The routing system does not respond to a network change immediately. Another example is convergence-free routing with Failure-Carrying Packets (FCP) [25]. FCP eliminates the routing conver- gence process, and uses a logical map distributor to pe- riodically distribute the static network map to routers. Transient failures do not trigger routing updates. This work explores a complimentary approach: can we reduce forwarding disruption without slowing down or constraining routing convergence? Our insight is that routing convergence, although costly, is advantageous. It enables the network to promptly respond to dynamic changes such as failures or traffic load variation and re- compute optimal routes. For instance, with routing con- vergence, packets need not hit failures first and then take a detour to reach their destinations. In addition, routing convergence also enables a simple form of intra-domain traffic engineering, in which congested links are avoided by adjusting link weights and announcing them in rout- ing updates [13]. Therefore, it is desirable for the routing system to quickly converge after dynamic changes. We propose a detect-then-rescue paradigm to simul- 1

Upload: hakien

Post on 01-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

CoRRS: COnsistent and Responsive Routing with Safeguard

Ang LiDept. of Computer Science

Duke [email protected]

Xiaowei YangDept. of Computer Science

Duke [email protected]

David WetherallIntel Research Seattle &University of Washington

[email protected]

ABSTRACT

The Internet uses a process called routing convergence

to quickly find suitable forwarding paths after dynamic

network changes such as failures. During routing con-

vergence, routers may have inconsistent forwarding state

that leads to massive packet loss or forwarding loops.

Real-time applications are often sensitive to loss and de-

lay, and may produce noticeable performance degrada-

tion. Recent work on reducing the adverse effects of

routing convergence all slows down or eliminates routing

convergence. This paper presents the design and eval-

uation of an intra-domain routing system CoRRS that

reduces forwarding disruptions during routing conver-

gence without scarificing the network’s responsiveness

to dynamic changes. CoRRS uses the remaing path cost

informatin carried by packets to detect forwarding anoma-

lies and to discover valid paths. Our evaluation based

on a linux implemention suggests that CoRRS is suit-

able for implementation on high-speed routers. It has a

small and fixed header overhead, and achieves 95% of

the vanilla IP packet forwarding throughput. Our simu-

lation comparison shows that CoRRS is more effective

and converges 2∼5 times faster than other designs.

1. INTRODUCTION

Real-time applications such as VoIP, online gaming,

video conferencing, and IPTV desire non-interrupted net-

work services [3, 7]. Even sub-second periods of packet

loss or delay may adversely impact the users of these

applications [7, 12]. Unfortunately, the present Inter-

net routing system can easily produce noticeable periods

of disruption following a change in the network, e.g.,

the failure or restoration of equipment, an ISP policy

change, or a traffic engineering route adaptation.

When a network change occurs (e.g., a link goes down),

routers must be informed and recompute their forward-

ing tables to adapt to the change, a process known as

routing convergence. During convergence, routers adja-

cent to failures may not have valid routes, or routers may

have inconsistent next hop sequences that form tempo-

rary loops called as micro-loops. As a result, packets

may be discarded or trapped in a loop. Recent mea-

surement shows that BGP convergence may cause 30%

packet loss for two minutes or longer [24], and micro-

loops formed during IGP convergence in a tier-1 network

account for 90% of the total packet loss [18].

Recent work on reducing forwarding disruption dur-

ing routing convergence [6, 11, 14, 15, 16, 20, 23, 25, 32,

37] has mainly focused on two general approaches. One

approach is to speed up the routing convergence process

by improving the underlying routing protocols [6, 15].

The other is to design new convergence techniques to

mitigate the adverse effects of routing convergence [14,

16, 17, 20, 25]. Unfortunately, fundamental limits such

as speed of light and memory update latency [19, 36, 39]

have suggested that the convergence period could not be

arbitrarily shortened. Even in an intra-domain environ-

ment, recent work suggests that the routing transition

time could be shortened to sub-second periods, but no

less. This result still falls short to meet the requirements

of real-time applications [7].

The second approach aims to ensure a consistent for-

warding state among routers to avoid loops and black-

holes, but at the cost of reducing the responsiveness of

the routing system. One example is consensus routing [20],

in which routers first agree upon the set of routing up-

dates that are safe to apply by taking periodic global

snapshots and running a consensus algorithm before they

update their forwarding tables. The routing system does

not respond to a network change immediately. Another

example is convergence-free routing with Failure-Carrying

Packets (FCP) [25]. FCP eliminates the routing conver-

gence process, and uses a logical map distributor to pe-

riodically distribute the static network map to routers.

Transient failures do not trigger routing updates.

This work explores a complimentary approach: can

we reduce forwarding disruption without slowing down

or constraining routing convergence? Our insight is that

routing convergence, although costly, is advantageous.

It enables the network to promptly respond to dynamic

changes such as failures or traffic load variation and re-

compute optimal routes. For instance, with routing con-

vergence, packets need not hit failures first and then take

a detour to reach their destinations. In addition, routing

convergence also enables a simple form of intra-domain

traffic engineering, in which congested links are avoided

by adjusting link weights and announcing them in rout-

ing updates [13]. Therefore, it is desirable for the routing

system to quickly converge after dynamic changes.

We propose a detect-then-rescue paradigm to simul-

1

Page 2: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

taneously achieve consistency and responsiveness. That

is, routers detect packets that are potentially “in danger”

(i.e., trapped in a loop or destined to a blackhole route)

during routing convergence, and then use a transient-

mode forwarding mechanism to rescue those packets.

This approach shares a similar spirit as concurrent work

on Anomaly-Cognizant Forwarding [11], but as we will

soon describe, it differs in both mechanism and focus. 1

A key challenge faced by the design is to enable routers

to detect abnormal forwarding state with low overhead

and high accuracy. Ideally, to be practical we wish to

detect all anomalies without adding extra information in

a packet header or increasing a router’s memory, pro-

cessing, or signaling overhead. Unfortunately, we show

in § 2.2 that even in a basic shortest-path based routing

system, it is impossible to detect forwarding loops accu-

rately without adding additional packet header overhead

or resorting to TTL expirations under generic network

conditions.

Motivated by this negative result, we consider designs

that let packets carry limited safeguard information for

routers to detect abnormal forwarding state. As a first

step, we present the novel design (§ 3-§ 4) of an intra-

domain routing system CoRRS that uses the remaining

path cost as a safeguard. We show that by letting packets

carry a fixed-length label that encodes their remaining

path cost, we can design a routing system that enables

routers to detect all inconsistent forwarding paths that

may lead to micro-loops or blackholes. Furthermore,

the cost label can be used to discover valid repair paths,

thereby achieving two goals: detection and rescue, with

one piece of packet state.

We strive to make the design amenable for implemen-

tation in high speed routers. It does not modify existing

routing protocols nor introduce new distributed conver-

gence protocols, unlike alternatives [14, 16, 17, 20, 25].

It adds packet header overhead, but this overhead is fixed

and small, unlike [11]. Packet forwarding is modified

but involves at most two table lookups. As our imple-

mentation shows (§ 6), a software implementation using

Click [22] can achieve XX% of the vanilla IP forward-

ing throughput. The additional memory and computation

overhead added by our design is comparable to other so-

lutions [9] that is less effective in terms of reducing for-

warding disruption.

We also compare our design with other alternatives

using simulations on real, inferred, and randomly gen-

erated topologies. We simulate topology changes and

observe the packet forwarding behavior during routing

transitions. Without CoRRS, we observe that on aver-

age 18% of the routing transitions cause micro-loops and

some loops can last up to half a second. Packets can be

trapped in a loop, traversing a uni-directional link tens

1An earlier version of this work has appeared in [27].

of times until their TTLs expire. In contrast, by carry-

ing cost as the safeguard information, we observe that

no packets are trapped in loops without delaying routing

convergence. The routing transition period in CoRRS is

2 ∼ 5 times shorter than a state-of-art loop prevention

convergence protocol [14].

We conclude and discuss future work on extending the

CoRRS approach to a policy-based inter-domain routing

environment in § 9. The rest of the paper describes the

design and evaluation of CoRRS in detail.

2. OVERVIEW

In this section, we motivate the design choice of using

the remaining path cost in a packet as a safeguard. We

first define the design goals, and then present a negative

result that shows routers are unable to detect forwarding

loops during routing transitions without altering packet

headers or excessive router state. We then explain why

carrying the remaing path cost may achieve our goals.

2.1 Goals

We aim to explore a detect-then-rescue approach to

achieve consistent and responsive routing. By consistent,

we mean that each hop makes forward progress. When a

routing system is in a stable and converged state, a packet

should be forwarded “closer” to its destination by certain

metric at each hop. By responsive, we mean that when

a network element changes its status (e.g., a node or link

failure or restoration), the routing system can quickly

learn the event and re-compute its routes to adapt to the

change. This adaptation is desirable because after con-

vergence, packets can be forwarded along the best routes

the network can provide. On the other hand, the conver-

gence process involves signifcant overhead. When the

rate of changes exceeds a threshold, the routing system

may not keep up with the changes. Consequently, prac-

tical routing systems [10, 31, 34] all have a damping

mechanism to reduce the reponsiveness of the routing

system in case of excessive network changes. Our de-

sign aims not to further slow down or constrain routing

convergence beyond the built-in damping mechansims.

This work focuses on achieving consistent and respon-

sive routing in a typical intra-domain routing environ-

ment in which routers run a link-state shortest-path based

routing protocol such as OSPF or IS-IS. We consider it

as a first step towards a global consistent and highly re-

sponsive routing system.

Our tasks are two-fold: detecting and repairing for-

warding anomalies. An anomaly generally refers to the

violation of the consistency requirement when a router

forwards the packet to the next-hop. Our design aims

to detect all forwarding anomalies, because undetected

anomalies may lead to undesirable effects such as black-

holes and forwarding loops. Loops are especially harm-

ful, because they not only affect packets trapped in the

2

Page 3: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Figure 1: This counterexample shows that routers can not detect

forwarding loops without extra information. All links are asym-

metric with one direction (the one shown in the figure) having a

much lower cost than the opposite direction.

loop, but also inflict collateral damage on packets that

share the links on the loop. For instance, if a video

stream of 1Mbps is trapped in a loop for 10 times, its

peak load on a link would become 10Mbps. Packets that

would have reached their destinations may be discarded

due to congestion.

After anomalies are detected, it is also desirable to for-

ward packets to their destinations with guaranteed suc-

cess (as long as the network is not partitioned) under all

convergence scenarios. However, we find this goal diffi-

cult to achieve, because the patterns of network changes

are unpredictable. It would require on-demand computa-

tion [25] for routers to find a valid path under all conver-

gence scenarios. To be practical and efficient, our design

only aims to provide guaranteed recovery for the most

common type of event [29]: a single non-overlapping

status change (e.g., failure, restoration, or weight adjust-

ment) of a link, node, or shared risk link group (SRLG).

In a less common case that independent network ele-

ments change their status concurrently, we consider it

practically acceptable that the design finds a valid path

with a high likelyhood. This is also the design require-

ment advertised by IETF’s IP fast reroute framework [37].

We note that detecting all anomalies under generic sce-

narios is still useful, because it enables a rescue mechan-

sim that is likely to succeed to kick in.

2.2 The Impossibility Result

Our first attempt to enable anomaly detection is to

consider designs that do not alter the IP packet header,

add little or no router state and processing overhead, and

do not introduce new control messages. These constraints

are desirable because they make a design amenable to

practical deployment. Unfortunately, we discover that

although it is possible to detect anomaly while satisfying

the above contraints under specific conditions, including

symmetric networks and single topology change, it is im-

possible for routers to detect forwarding loops without

relying on TTL expirations under generic settings.

We prove this negative result using a counterexample

shown in Figure 1. The high-level idea underlying the

proof is to show that a forwarding loop may form dur-

ing a routing transition period while each router on the

loop observes normal forwarding behavior according to

its local view of the network. With the constraint of not

adding extra overhead, we assume that the only avail-

able information for a router to discover an anomaly is

a packet p’s header and content, the packet’s incoming

and outgoing interfaces, and a router’s local forwarding

tables. We do not assume that a router can “remember” a

packet that it has seen before, because we do not assume

that a router keeps additional per-packet state.

A main forwarding anomaly that we aim to detect is

micro-loops. To detect this anomaly, some router must

be able to tell the difference between a packet that is

trapped in a loop from a packet that is not. Without

keeping additional per-packet state, a router can only tell

the difference from information associated with a packet,

i.e., the packet itself p, its incoming p.i interface, and its

outgoing interface p.o.

Unfortunately, under certain scenarios, a packet p that

is trapped in a micro-loop may appear the same as a

normal packet p′ to every router on the loop accord-

ing to each router’s current view of the network, i.e.,

(p, p.i, p.o) ≡ (p′, p′.i, p′.o) Figure 1 shows the coun-

terexample. All links in the network are asymmetric.

One direction (the one shown in the figure) has a much

lower cost than the opposite direction. Consider the paths

between the node S and D. Before the two links (S →

X and Y → D) fail, there are two equal-cost paths from

S to D: S→X→Z→W→Y→D and S→W→Y→D.

Packets from S to D can be forwarded along either path.

After the two links fail and all nodes have re-computed

their forwarding tables, the new shortest path would have

become S→W→Y→X→Z→D. Now suppose that during

the routing transition period, the shaded nodes: S, X ,

and Y have updated their forwarding tables, while the

other nodes on the new path (W and Z) have not updated

to the new forwarding tables. In this situation, a packet

p sent from S to D will be trapped in a forwarding loop

S→W→Y→X→Z→W→Y→X · · · .

However, were the routing system not in transition,

each node on the loop might receive a packet p′ such

that (p, p.i, p.o) ≡ (p′, p′.i, p′.o). For instance, for node

W , a packet p trapped in the loop would have an incom-

ing interface Z , and an outgoing interface hop Y . From

the node W ’s current view, the two links did not fail. In

this case a packet p′ sent from S to D can have the in-

coming interface Z and also the outgoing interface Y .

Since a packet can contain arbitrary content, p and p′

may have the same content (including the TTL values)

when they arrive at W . If the node W detects an anomaly

when it forwards the packet p, it will also consider p′ as

an anomaly. This is a contradiction, because before the

links failed, p′ is a normal packet that will reach its des-

tination. Similarly, one can verify that the same is true

for other nodes involved in the loop.

3

Page 4: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Figure 2: Forwarding using cost-carrying packets on the Abi-

lene network.

Although the above example is extremely specific, it

is sufficient to prove a negative result, because if the hy-

pothesis of detecting anomaly without extra information

were true, it must be true for all scenarios.

2.3 Carrying Path Cost as a Safeguard

The above negative result motivates us to consider de-

signs that include limited information in packet headers.

The remaining path cost becomes a natural choice, as

cost is the metric that routers use to choose forwarding

paths. During normal forwarding, a node ni will only

forward a packet to a next-hop node ni+1 if ni considers

ni+1 as closer to the destination than itself. Additionally,

if in ni’s current view, the remaining path cost from ni+1

is different from that in ni+1’s current view, ni and ni+1

must have inconsistent views of the network, indicating

that the routing system is in transition.

In our design, packets carry the remaining path cost

to their destinations. Routers compare their locally com-

puted costs with the costs carried in packets, and detect

an anomaly if they are different. Figure 2 shows an ex-

ample using the Abilene network topology. Suppose the

link between Denver and Kansas City fails. The Den-

ver node has updated its forwarding tables to use the

Sunnyvale node to reach Kansas City. Unfortunately,

the Sunnyvale node has not updated its forwarding ta-

ble. Without carrying the path cost, packets will loop be-

tween Sunnyvale and Denver, until the Sunnyvale node

updates its forwarding tables.

If packets carry the remaining path cost, the Sunny-

vale node will stamp the remaining path cost to Kansas

City (639) into the packets it forwards to Denver. As

the Denver node has updated its forwarding tables to by-

pass the failed link to Kansas City, its local path cost

(4456) will be higher. The Denver node detects a cost

inconsistency, and can then invoke a rescue mechanism

to prevent the forwarding loop between Sunnyvale and

Denver.

2.4 Using Path Cost to Discover Valid Paths

The remaining path cost has another attractive fea-

ture that it can be used as a hint for router to discover

valid paths without additional packet header overhead,

because different paths tend to have different costs. When

a router detects an anomaly using the cost carried by a

packet, the cost value also provides a hint on the valid

path the router could use to forward the packet. For

instance, in Figure 2, if the Sunnyvale node receives a

packet destined to Kansas City with a cost 3161 from the

Denver node, it will notice that the packet cost is higher

than its local cost. So it is likely that a downstream node

on its default path has discovered a failure that it has

not learned. If the Sunnyvale node has pre-computed

an alternative path (Sunnyvale→Los Angeles→Houston

→Kansas City) that has the same cost 3161, it indicates

that the downstream node that has discovered the failure

consider this alternative path valid. The Sunnyvale node

can safely forward the packet along this path to bypass

the downstream failure that it has not learned.

The remaining path cost has a limitation that it can not

distinguish multiple paths with equal cost. When mul-

tiple alternative paths have the same cost, a node may

choose an invalid path that goes back to a failure. We

address this issue in Section 4.

2.5 Comparing Path Cost with Other Choices

Packets may carry other information for anomaly de-

tection and recovery. We have considered a number of

choices, including topology or path identifiers, the en-

tire path visited by a packet, failures encountered by a

packet, or the hop count. We opt for cost because it en-

ables early detection and optimal repair. That is, pack-

ets do not need to hit failures to be rerouted, and a repair

path is often the shortest path after the network has con-

verged. Moreover, as we soon explain, it is more flex-

ible than carrying a path or a topology identifier, more

lightweight than carrying a path, more effective than car-

rying failures, a path or a topology identifier, or a hop-

count. We discuss these advantages as follows.

Failure: A packet may carry the failures it has encoun-

tered. This requires a variable-length header, and pack-

ets can not be rerouted before they reach a failure. For

instance, in Figure 2, packets sent from Los Angeles to

Kansas City will need to hit Denver first before they are

rerouted, taking the detour (Los Angeles→Sunnyvale→

Denver→Sunnyvale→Los Angeles→Houston→Kansas

City) to reach its destination. In contrast, if packets carry

the remaining path cost, the first updated node is able to

reroute packets towards their destinations along the new

shortest paths. For instance, if the Sunnyvale node has

updated while Los Angeles not, the packet will take a

shorter path (Los Angeles→→Sunnyvale→Los Angeles→

Houston→Kansas City) to reach its destination.

Moreover, carrying failures does not prevent forward-

ing loops during routing convergence, because different

routers may still see different network topologies during

4

Page 5: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

routing transitions.

Path: If a packet records the nodes it has traversed,

routers can avoid forwarding the packet back to where

it is from, thereby avoiding forwarding loops. How-

ever, this option requires a variable-length packet header,

which is less efficient than using a fixed-length label that

encodes cost.

Path identifier: A packet may carry a remaining path

identifier that specifies the remaing path a packet takes.

A router can discover that its local path identifier is dif-

ferent from the one in a packet, but it cannot learn from

the packet path identifier what may cause the inconsis-

tency, thereby choosing a valid path. In contrast, a router

can compare the cost values. If the one in a packet is

higher, it indicates an unknow failure on a router’s de-

fault path. The router can then follow the path suggested

by the path cost in the packet.

In addition, a path identifier prevents a downstream

node from changing a packet’s forwarding path. This

flexibility is desirable because a downstream node can

dynamically split traffic over multiple equal-cost paths

based on load of its links to mitigate congestion. More-

over, setting up path identifiers require additional sig-

naling or computation overhead, while cost values are

readily available from shortest-path computation.

Topology identifier: A packet may carry an identifier

that specifies the network topology an upstream router

uses to compute the forwarding path. A router can signal

a forwarding anomaly if the packet’s topology identifier

is different from its local topology identifier. However,

inconsistent topologies do not necessarily mean incon-

sistent paths. Signaling on inconsistent topologies intro-

duces false positives, and may waste network resources

by triggering unnecessary recovery. Morever, similar to

a path identifier, a router learns little information on the

valid path to follow from a topology identifier.

Hop Count: A packet may carry its remaining hop

count to a destination. The hop count metric is similar

to cost because both measure a packet’s distance to its

destination. However, practical routing protocols such

as OSPF and IS-IS use fine-grained cost metrics to com-

pute paths. Consequently, hop count is a weaker hint for

choosing a repair path once a forwarding anomaly is de-

tected. More importantly, it is incompatible with equal-

cost multi-paths (ECMPs) forwarding, because paths with

equal cost may have different hop counts.

3. BASIC DESIGN

In this section, we describe the basic design of using

the remaining path cost to detect forwarding anomalies

and to discover valid alternative paths. For ease of ex-

position, we assume that different paths have different

costs for the moment, and discuss the problem caused

by equal-cost paths and how we address it in § 4. We

refer to a node or link failure (or cost increment) event

as a Down event, and restoration (or cost decrement) as

an Up event.

3.1 Header Format

In our design, a packet carries a short label (32-bit

in our implementation) that encodes the remaining path

cost to its destination. A packet header also has a counter

that routers use to record the total number of anomalies

they detect. The length of counter determines how many

times routers will attempt to rescue a packet. Our design

uses a 1-bit counter for simplicity. When the counter is

zero (one), the packet is in the normal (1-anomaly) for-

warding mode.

3.2 Alternative Path Database

To successfully recover from a forwarding anomaly, a

router needs to pre-compute alternative paths by antici-

pating a future failure on its current network map G. If

a router anticipates a network element e (a link, node,

or SRLG) may fail in the future, it computes its short-

est paths to reach a destination without the element e in

its current network map. The router stores the alternative

paths’ next hops and costs in an alternative path database

(APD). Conceptually, an APD is a table that stores the

mapping from the pair (dst, cost) to a valid next hop.

During a routing transition, if a router’s local path cost

differs from a packet cost, the router may attempt to find

an alternative path that matches the cost on the packet,

and forward the packet along that path.

A router may re-compute this APD whenever it re-

ceives a routing update that changes its current network

map. Since the updated APD is only used to when the

next topology change occurs, the computation is not ur-

gent, and could be done in low priority after a router has

computed its forwarding entries. In the case that a topol-

ogy update results in a larger or better topology, e.g., a

link up, a router can save some computation by swap-

ping the next hops and path costs in its previous forward-

ing entries to its APD before it updates those entries, as

those entries would be the alternative path entries com-

puted without the newly added topology element.

3.3 Forwarding

We now describe how a router uses the cost on a packet

to prevent loops and resolve forwarding inconsistency

during routing transitions. When a packet that carries

its remaining path cost pkt.cost arrives at a router ni,

the router compares its local cost ni.cost with pkt.cost.

Dependent on the comparison result, the router ni takes

different forwarding actions and increases the anomaly

counter correspondingly. A packet enters the network in

the normal mode. The comparison has three outcomes:

Normal (ni.cost ≡ pkt.cost): This indicates that the

5

Page 6: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

router ni and its upstream router have consistent for-

warding paths. The router updates the packet’s cost pkt.cost

by subtracting its link cost to the next hop ni+1: pkt.cost

← pkt.cost − costni→ni+1, and forwards the packet to

its next hop ni+1.

Cost Increase Inconsistency (ni.cost > pkt.cost): This

inconsistency shows that a router’s local path cost is higher

than its upstream router’s cost. The network must be

in routing transition, as the router ni has computed dif-

ferent paths from other routers. If the packet is in the

normal mode, then this is the first cost inconsistency the

packet encounters. The router will attempt to resolve the

path inconsistency.

In the cost increase case, the router ni resolves the

inconsistency by forwarding the packet along its default

path. Its default path must be valid because ni must have

a smaller topology than other routers, as it has a higher

path cost. If it is a Down event, ni must have already

updated its forwarding table according to the event, and

its path will bypass the failed component. If it is an Up

event, ni must have not updated its forwarding table, and

its path will not use the newly restored or added compo-

nent, but can still reach the destination.

So the router ni updates the packet cost using its local

cost: pkt.cost← ni.cost − costni→ni+1, increases the

anomaly counter to 1-anomaly, and forwards the packet

to its next hop ni+1.

A special case of cost increase inconsistency occurs

when a packet reaches a router adjacent to a failure, but

the router has not updated its forwarding tables. The

router’s next hop is invalid and the local path cost at the

router is infinity. This situation may happen right after

a router detects a failure, or when a router is damping a

flappy link.

In this case, a router ni will immediately start using

the alternative path pre-computed to bypass its next hop

failure to forward and update a packet cost. The router

will first try to use the alternative path that bypasses its

next hop node. If such a path is unavailable, e.g., the next

hop is the destination, the router uses the alternative path

computed by removing the link to reach the next hop.

Let the alternative path’s next hop be n′

i+1, and the path

cost be ni.cost′. The router ni increases the anomaly

counter to 1-anomaly, updates its cost to be: pkt.cost←

ni.cost′ − costni→n′

i+1.

If a packet is already in 1-anomaly mode and the router

detects a cost increase inconsistency, it indicates that the

rescue effort by a previous router fails, which may occur

during multiple topology updates. A router may either

demote the packet to low priority so that it will not com-

pete for traffic with normal or 1-anomaly traffic, or dis-

card it. Our design discards packets that encounter mul-

tiple anomalies for simplicity, because such events are

rare and may occur only in multiple independent topol-

Figure 3: There are two ECMPs from a to c: a→b→c and a→d→c.

ogy update events that are not prepared.

Cost Decrease Inconsistency (ni.cost < pkt.cost): This

inconsistency shows that a router’s local cost is lower

than its upstream router. Again, the network must be in

a transition. As the router has a lower cost, it must have

a larger topology. It is no longer safe to forward along

the router’s default next hop, because it may lead to a

failure.

To resolve a cost decrease inconsistency, a router uses

the packet cost pkt.cost to look up an alternative path

in its APD, because its APD is computed using smaller

topologies than its current one, and a higher cost path

may be found in the APD. Suppose this lookup returns

a next hop n′

i+1. The router ni updates the packet cost

using the link cost to reach n′

i+1: pkt.cost← pkt.cost−

costni→n′

i+1, increases the anomaly counter to 1-anomaly,

and forwards the packet to n′

i+1.

The router may not find the packet cost in its alter-

native path database, which may happen when there are

multiple topology updates, e.g, multiple independent links

up simultaneously. In this case, the path cost cannot be

used to locate a valid path, and any further forwarding

may risk forming a loop. Similarly, the router may ei-

ther demote the packet or discard it. Our design chooses

discard for simplicity.

4. HANDLING EQUAL-COST PATHS

The basic design faces a significant challenge: although

path costs are highly likely to be distinctive among dif-

ferent paths, if equal cost multi-paths (ECMPs) do exist

between a pair of nodes, a path cost may not uniquely

identify a consistent forwarding path. Loops may form

after packets are repaired, even though the detection of

inconsistency is always effective.

Figure 3 shows an example. The node a has two ECMPs

to reach c: a→b→c and a→d→c. Suppose the link d→c

fails. The node d will use the alternative path to reach c:

d→a→b→c. It stamps a path cost 2 in the packet, and

forwards it to the node a. Unfortunately, the node a has

not updated its forwarding tables, and still thinks both its

ECMPs are valid as no cost inconsistency is detected. If

it picks the path a→d→c, a micro-loop forms.

4.1 Adding Noise to Link Cost

A key insight behind our solution to the ECMP prob-

lem is that equal cost multi-paths are primarily used for

load-balancing or failure backup in practice. A down-

stream router has the flexibility of deflecting packets to

6

Page 7: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Figure 4: A new version of the topology shown in Figure 3 with

salted costs. Two ECMPs between a and c now have distinct salted

costs.

any next hop that has the same cost to the destinations.

We think that this flexibility is less critical than safely

forwarding packets to their destinations during a tran-

sient routing convergence period. As this period only

lasts temporarily, it may be acceptable to let a down-

stream router follow the path an upstream router chooses

to prevent forwarding loops.

Our design embeds a random noise in the lower k bits

of a link cost to distinguish equal-cost alternative paths

an upstream router chooses. A router ignores the lower

k bits of the cost for packets in the normal forwarding

mode, but uses those k bits to differentiate alternative

paths for packets in the 1-anomaly mode. Different al-

ternative paths will have distinct costs with a high prob-

ability, but routers can still use ECMPs during normal

forwarding. We describe this mechanism in more detail.

4.2 Computing Salted Path Cost

A link cost now has two parts: a normal link cost and

a random noise. The normal link cost is what a network

would configure a link cost to be without considering

failure recovery issues. We refer to the cost with a noise

as a salted cost. A salted link cost can be viewed as a

pair of two values: (cost, noise), and is denoted as cost.

A noise is randomly distributed between [0, 2k − 1].A router uses the salted link costs to compute its APD

as described in § 3.2. That is, it removes an element e

from its current network map, and uses the shortest path

algorithm and the salted link costs to compute the alter-

native paths to reach a destination. During the shortest

path computation, to avoid carrying over any bit from

the noise to a normal path cost, a router adds each part

separately: cost1 + cost2 = (cost1 + cost2, noise1 +noise2). Similarly, salted costs are compared lexico-

graphically: cost1 > cost2, if cost1 > cost2 or cost1 ≡

cost2, and noise1 > noise2. These addition and com-

parison rules guarantee that the lowest salted cost path

must be one of the original ECMPs. When a router stores

a salted path cost, it only stores k-bits of the noise path

cost modulo 2k: cost ← (cost, noise mod 2k). This

encoding rule ensures that normal ECMPs’ salted costs

only differ in the last k bits.

A router ni computes its normal forwarding paths only

using the normal link cost portion of a salted cost. Its

normal forwarding paths are not affected by the noise

cost. In addition, for each destination d, a router also

computes its lowest salted cost path and the lowest salted

cost paths for each of its ECMP next hop. This is be-

cause during routing transitions, after a packet is in the

1-anomaly mode, it must carry a salted path cost that

uniquely identifies the alternative path, i.e., the lowest

salted path cost, at every hop. A router may need to com-

pare with this cost, or stamp this cost when it forwards a

packet to its next hop. These lowest salted cost paths can

be computed on a router’s shortest path forwarding trees

using the salted link costs. After the computation, for

each ECMP next hop ni+1, a router stores the lowest re-

maining salted path cost for each next hop:ni.costni+1(d),

in its forwarding tables. A router also stores its own low-

est salted path cost ni.cost(d).

4.3 Modified Forwarding Algorithm

In the modified design, a packet carries a salted re-

maining path cost pkt.cost. A router ni compares this

cost with its lowest salted path cost ni.cost in its for-

warding tables and take different actions based on the

outcomes.

Normal Mode Packets: If the packet is in the nor-

mal mode, then the router ni compares the normal path

costs only, i.e., ignoring the lower k bits of its own salted

path cost and in the packet cost. We use ni.cost and

pkt.cost to denote the normal path cost, and ni.noise

and pkt.noise to denote the noise portion. If the nor-

mal path costs are equal: pkt.cost ≡ ni.cost, the router

picks a next hop ni+1 according to its local load bal-

ancing algorithm, and stamps the remaining salted path

cost of that next hop ni.costni+1into the packet, and for-

wards the packet to the next hop ni+1.

If the router’s normal path cost is higher, ni.cost >

pkt.cost, a cost increase inconsistency occurs. Similar

as before, this indicates that a router has a smaller topol-

ogy, and it can use its shortest path to keep forwarding.

The router may have different ECMPs. After increasing

the packet’s anomaly counter to 1-anomaly, it can use

any ECMP next hop and stamps the lowest salted cost of

that next hop to forward the packet.

If the router’s normal path cost is lower, ni.cost <

pkt.cost, then a cost decrease inconsistency occurs. Sim-

ilarly, the router must have a larger topology, and it will

attempt to find an alternative path in its APD. The router

will use the salted path cost in the packet pkt.cost to

look up the alternative path. Suppose the router finds an

alternative path with the same salted cost pkt.cost, and

the next hop is n′

i+1. The router increases the anomaly

counter to 1-anomaly, updates its packet cost: pkt.cost

← (pkt.cost - costni→n′

i+1, (pkt.noise - noiseni→n′

i+1)

mod 2k), and forwards the packet to the next hop n′

i+1.

If a router cannot find a path with the same salted cost

pkt.cost which may happen during multiple concurrent

network changes, it discards the packet.

7

Page 8: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

k Collision Probability

10 0.0097

16 0.00015

24 6.0 × 10−7

32 2.3 × 10−9

Table 1: The probability of having two equal salted cost paths

between two nodes when there are c = 5 normal ECMPs, given

different noise length k.

1-anomaly Mode Packets: When a packet has al-

ready encountered an anomaly, a router ni compares the

packet’s salted cost pkt.cost with its own lowest salted

path cost ni.cost. This is because some of the router’s

ECMPs may already have failed. The router needs to

use the salted cost to uniquely identify a valid forward-

ing path. If pkt.cost ≡ ni.cost, the router forwards the

packet to the next hop ni+1 that has the lowest salted

cost, updates the packet cost to be the remaining salted

cost: pkt.cost ← ni.costni+1, and forwards the packet

to the next hop ni+1.

If ni.cost 6= pkt.cost, then the router looks up the

salted packet cost pkt.cost in its alternative path database.

If a match is found, it forwards the packet and updates its

path cost as described before. Otherwise, it discards the

packet.

We revisit the example in Figure 3 to show why the

salted cost and the modified algorithm address the ECMP

issue. Figure 4 shows the same topology with salted link

costs. The two ECMPs a has now have distinct salted

costs: (2, 13) and (2, 17). Suppose a has not updated

its forwarding tables after the link d → c has failed. It

forwards a packet to d and stamps the lowest remaining

salted cost (1, 7). The node d has updated, and detects

a cost increase inconsistency. It marks the packet in 1-

anomaly, forwards the packet via its own next hop, which

is a, and stamps the lowest remaining salted path cost

(2, 13). When a receives this packet, it uses the salted

path cost (2, 13) to unambiguously choose the correct

alternative path next hop b, instead of sending it back to

d, as the salted path cost does not match and the packet

is already in 1-anomaly mode.

4.4 Cost Collision Analysis and Prevention

The random noise added to a link cost makes two nor-

mal ECMPs have distinct salted path costs with a high

probability. We analyze this probability, and discuss how

to reduce the probability of collision to be negligible.

A noise cost is chosen randomly between [0, 2k − 1].The sum of noise costs modulo 2k is also randomly dis-

tributed. If a node has c normal ECMPs or c repair paths

with the same normal costs, the probability that no two

such paths have the same salted path cost is:

1 · (1−1

2k)...(1 −

c− 1

2k) =

2k!

2ck(2k − c)!

Table 1 shows the probability of collision when c = 5

for various values of k. In practice, c is typically small

(< 5) 2, because two backup paths usually suffice.

As can be seen, with practical values of c and k, the

probability of collision is low. In case a collision does

occur, a network administrator can try different noise

values until a collision-free noise configuration is found.

This is doable because noise values do not affect normal

forwarding operations, and the probability of collision

is small. Our simulations use a 10-bit noise value, and

we do not run into any collisions on all simulated topolo-

gies, including an inferred tier-1 topology. Therefore, we

think that the probability of collision can be practically

ignored if we use a 16-bit or longer noise.

5. PROPERTIES

We now briefly describe the forwarding properties of

our design. We omit formal proofs due to the lack of

space. When stating those properties, we do not consider

congestion loss, because it is not caused by violations of

forwarding consistency. We also ignore the failure detec-

tion period during which routers may forward packets to

a failed link without noticing the failure, and the router

initialization period during which a newly added router

has not obtained any topology information.

Property 1 Packets will follow normal shortest paths, in-

cluding ECMPs to reach their destinations in the normal

mode when the network is in a stable state.

This property holds because routers compute their nor-

mal forwarding paths in the same way as when link costs

do not have noises. Our algorithm compares the normal

path cost in a packet and a router’s local normal cost first

for packets in the normal mode. In a steady state, these

two cost values will always match and packets will reach

their destinations without triggering any inconsistency.

Property 2 If different paths have distinct salted path

costs, then during the routing transition period in which

only one network element changes its status and the net-

work is not partitioned, a packet will be forwarded to its

destination in either the normal or 1-anomaly mode.

This property holds because when there is only one

element changing its status, a router always has a valid

path in either its normal forwarding tables or its ARD,

and if a packet is destined to a failure or trapped in a

loop, a router is able to detect a cost inconsistency and

forward the packet along an alterative valid path.

Property 3 A packet will not be trapped in a micro-loop

without being discarded.

By trapped in a micro-loop, we mean that if all routers

stop updating their forwarding tables after forming a loop,

a packet will not escape the loop until its TTL expires.

This property holds because a packet cannot traverse a

node twice without a cost inconsistency, and after two

2This observation is based on five real ISP topologies.

8

Page 9: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Figure 5: The structure of the prototype implementation

Figure 6: The encoding scheme of cost. We overload the grey

area in IP header to carry the 32-bits cost.

cost inconsistencies or a router fails to find an alternative

path, the packet will be discarded.

6. IMPLEMENTATION

To understand the overhead and complexity of CoRRS,

we implement our design using the Quagga routing suite [2]

and Click modular router [22]. Figure 5 shows the struc-

ture of the implementation. The shaded boxes are the

modules we modify. We modify Quagga’s ospfd rout-

ing daemon to compute alternative paths as well as short-

est paths after each routing update, and associate a path

cost with each forwarding entry it sends to Click. We

then modify Click to support the forwarding algorithm

described in § 4. We also modify the interface between

Quagga and Click to support the exchange of the path

cost information. Our modification includes 3000 lines

of C/C++ code, which is about 6% of the ospfd code.

Figure 6 shows how we encode a 32-bit cost label in an

IP header. We overload the 16-bit IPID field, the 13-bit

fragment field, the 1-bit reserved flag field, and 2 high-

est bits in the TTL field to encode a 22-bit path cost and

10-bit noise. The anomaly counter is implemented using

different DiffServ codepoints. A 22-bit path cost is suf-

ficient because OSPF uses a 16-bit link cost metric, and

a network’s diameter almost never exceeds 64 hops.

We test our implementation using the Emulab testbed [40].

We construct an Abilene topology, and run both the vanilla

IP with OSPF and CoRRS on it. The ospfd’s timers are

configured as shown in Table 3. Those values are set ac-

cording to router implementations, as we will describe

in our later simulation study (§ ??). We run the exam-

ple shown in Figure 2. We first start a 1Mbps UDP flow

from the Sunnyvale node to the Kansas City node, and

then manually fail the link from Denver to Kansas City.

After the failure, the Sunnyvale node updates its routing

table one second later than the Denver node. We measure

the load of the link from Sunnyvale to Denver, which is

part of the loop, and the packet loss rate at the destination

Figure 7: This figure shows the traffic load on the Sunnyvale→

Denver link and the packet loss rate after the Denver→Kansas City

link fails.

node Kansas City.

Figure 7 shows the link load and packet loss rate dur-

ing the convergence period. The link failure happens

at time 1s. From the figure we can see that a forward-

ing loop occurs under vanilla OSPF and IP routing, and

it temporarily increases the link load by more than 30

folds, as the linux machines on Emulab use 64 as their

default TTLs. In contrast, CoRRS prevents loops and the

link load is not increased. The UDP flow under vanilla

OSPF and IP routing is interrupted by almost 1.5 sec-

onds, while CoRRS recovers the packet transmission right

after the failure is detected, which takes about 250ms.

We evaluate the computational and memory overhead

of CoRRS and compare them with NotVia [9], a state-

of-art IP fast rerouting technique that uses pre-computed

backup paths to bypass failures. The results are summa-

rized in Table 2. Our test runs on a Pentium D 2.4GHz

machine with 2GB memory.

The right side of Table 2 shows the time it takes to fin-

ish computing the alterative paths for CoRRS and NotVia

for various topologies. As can been seen, CoRRS’s com-

putation takes less than 100ms on the largest Sprint topol-

ogy, and the time is comparable with NotVia.

We evaluate the memory overhead by examining how

many entries a router needs to keep in its alterative path

database. The left side of Table 2 shows the results. The

results show that a repair path database may be 2-8 times

larger than a router’s normal forwarding table, and is

comparable with that of IPFRR. If the memory overhead

becomes a practical concern, it can be optimized [26].

We omit the details due to the space limitation.

We also benchmark the per-packet processing over-

head of CoRRS. This test is run on an AMD Dual Core

2.6GHz machine with 2GB memory. The vanilla IP for-

warding takes 0.8 µs, and CoRRS’s forwarding takes on

9

Page 10: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Topology # of Forwarding # of NotVia Entries # of RPD Entries RPD Computation NotVia Computation

Table Entries Time (ms) Time (ms)

Abilene Avg 15.4 Avg 17.3 Avg 0.165 Avg 0.093

Node:11 11 Max 17 Max 26 Max 0.176 Max 0.112

Link:28 Min 14 Min 12 Min 0.157 Min 0.073

Sprint Avg 368.8 Avg 777 Avg 79.4 Avg 49.7

Node:315 315 Max 449 Max 1769 Max 89.2 Max 84.2

Link:1944 Min 278 Min 534 Min 71.9 Min 36.8

Random Avg 116.6 Avg 276.1 Avg 6.2 Avg 2.7

Node:100 100 Max 140 Max 376 Max 11.9 Max 10.6

Link:394 Min 102 Min 149 Min 5.8 Min 2.0

Table 2: Summary of the memory and computational overhead introduced by CoRRS. For memory overhead the normal forwarding

table size and the number of NotVia entries are shown for comparison. For computational overhead the NotVia computation time is shown

for comparison.

average 1.7 µs. The processing overhead mainly comes

from the extra table lookup in the alterative path database

for packets in the 1-anomaly mode. We also compare

the throughput of vanilla IP forwarding and CoRRS for-

warding. On our test machine, the peak throughput for IP

forwarding is 496.8kpps, and for CoRRS forwarding is

471.6kpps. The performance decreases slightly by 5%.

7. SIMULATION

In this section, we use SSFNet [4], an event-driven

simulator that has a complete OSPFv2 implementation,

to simulate CoRRS and compare it with other designs,

including packets that carry failures (FAIL), and fast rerout-

ing techniques (FRR). CoRRS is implemented as described

in the previous sections. To enable a fair comparison,

FAIL is implemented as a much reduced version of FCP [25].

Each packet carries the last failure it encounters. Routers

use the existing routing protocol OSPF to converge. This

simplification makes FAIL have comparable header, com-

putational, and signaling overhead with CoRRS.

FRR are techniques that enable routers adjacent to fail-

ures to rapidly reroute packets around the failures. We

simulate NotVia [9], the most effective solutions in IP

networks. Fast rerouting techniques in MPLS network,

known as MPLS-FRR [33], are similar to NotVia except

that a packet’s next hop is identified using a label, rather

than the destination address. Thus we do not simulate

MPLS-FRR separately as it has the same characteristics

as NotVia.

FRR can be used in conjunction with a loop-prevention

convergence technique [8] to reduce micro-loops. Known

techniques [14, 16, 17] all slow down routing conver-

gence, and can not prevent loops when there are concur-

rent topology changes. For comparison purpose, we also

simulate the combination of FRR and a loop-prevention

convergence technique oFIB [14] (FRR+oFIB). We choose

oFIB because it converges faster than alternatives [16,

17].

7.1 Miscellaneous Settings

Network Topologies: We simulate on real, inferred,

and randomly generated topologies which are summa-

Parameter Value

HelloInterval 50ms

RouterDeadInterval 250ms

SPF Delay 200ms

SPF Computation Time (0.00247n2+0.978)ms

FIB/RIB Update Time (rand([100, 300])+ rand([0.1, 0.11])p)ms

Table 3: Summary of the simulation settings. n is the number of

routers in the network. p is the number of entries in the forwarding

table.

Topology Type # of Nodes # of Links

Abilene Real 11 28

Telstra Infer 108 306

Exodus Infer 79 294

Sprint Infer 315 1944

Random Random 100 394

Table 4: Summary of the topologies used in our simulation.

rized in Table 4. The inferred topologies are from the

Rocketfuel project [38], and the random topologies are

generated using the BRITE topology generator [1]. Real

and inferred topologies contain precise or inferred link

weights [28]. We use the random topology to test how

our design works on asymmetric networks. The link

weight in each direction is set independently, each using

a random number between 1 and 50.

Link delays of each topology are set according to the

geographic approximity of their end nodes. If two routers

are in different Point-of-Presences (PoP), we infer the

link delay between them from the geographical distance,

and in the generated topology the nodes are randomly

spread on a plane that has a similar size as the US conti-

nent. If two routers are in the same PoP, we assume the

link delay is 0.1ms.

Topology updates: We simulate routing transitions

for both single topology update and multiple concurrent

topology update events. For single topology update, we

test single link up/down events and node up/down events.

For multiple topology updates, we test two concurrent

link failures. For each type of update we run 100 exper-

iments with randomly chosen topology updates.

OSPF parameters: The various timers and delays of

the OSPF implementation are summarized in Table 3.

10

Page 11: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

Update Type # of Tests Total # of Total # of Loop Duration (ms)

Containing Loops Micro-loops Links Involved Avg Max Min Stddev

OSPF

Link Failure 13 37 64 57.9 131.6 1.25 37.5

Node Failure 20 123 182 62.2 474.7 0.67 54.6

Link Up 9 23 46 45.1 162.3 0.69 33.2

Node Up 28 129 188 52.9 162.8 0.96 33.7

Two Link Failures 20 61 96 47.7 158.3 1.47 35.4

oFIB

Two Link Failures 24 60 96 45.9 172.5 0.22 38.9

Table 5: Summary of loops during convergence in the Sprint (AS1239) topology. For each update type we run 100 experiments with

randomly chosen topology updates.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Cum

ulat

ive

Fra

ctio

n

Flow Amplifying Factor

CoRRS+OSPF: 1 LinkCoRRS+OSPF: 2 Links

FRR+OSPF: 1 LinkFAIL+OSPF: 1 Link

IP+OSPF: 1 LinkFRR+oFIB: 2 Links

(a) Abilene

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Cum

ulat

ive

Fra

ctio

n

Flow Amplifying Factor

CCP+OSPF: 1 LinkCCP+OSPF: 2 Links

IP+OSPF: 1 LinkIP+OSPF: 2 Links

NotVia+oFIB: 2 Links

(b) Sprint

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Cum

ulat

ive

Fra

ctio

n

Flow Amplifying Factor

CoRRS+OSPF: 1 LinkCoRRS+OSPF: 2 Links

FRR+OSPF: 1 LinkFAIL+OSPF: 1 Link

IP+OSPF: 1 LinkFRR+oFIB: 2 Links

(c) Random

Figure 8: The cumulative distribution of the loops’ flow amplifying factor on three topologies for one link and two link failure cases.

These parameters are set according to the values rec-

ommended by various fast convergence techniques [6,

21] and the values observed in commercial production

routers [5, 15, 35]. We simulate fast convergence be-

cause our design’s benefits are even more prominent dur-

ing slow convergence in which micro-loops last longer,

but we desire to show only the benefits that cannot be

achieved by other techniques. With these settings, all

our simulated networks can converge within a second,

consistent with previous studies [6, 15].

7.2 Reducing Forwarding Disruptions

The first set of experiments evaluate how effectively

CoRRS can reduce forwarding disruptions during rout-

ing transitions. We use two metrics to measure forward-

ing disruptions: packet loss rates, and flow amplifica-

tion factors caused by micro-loops. The latter measures

the number of times a packet traverses the same uni-

directional link because of a micro-loop. We refer to

this factor as the flow amplifying factor because if a traf-

fic stream of t Mb/s is trapped in a loop and has a flow

amplifying factor K , then the stream’s peak rate would

become t×K Mb/s on the link. An amplified flow may

cause excessive congestion loss.

To measure the flow amplifying factors and packet

loss rates, we send probing packets every 5ms between

each pair of nodes. The probing packets’ TTLs are ini-

tiated to 128, the default TTL value of the Windows

XP operating system. We then measure the flow am-

plifying factors and packet loss rates from the probing

packet traces. Note that we do not simulate real traffic

patterns because it is extremely time-consuming to sim-

ulate. Otherwise our simulations would not finish in a

reasonable time, i.e., a few days.

Figure 8 compares the distributions of the flow ampli-

fying factors of vanilla IP forwarding with OSPF, CoRRS

with OSPF, FRR with OSPF, and the combination of

FRR and oFIB in a real network topology (Abilene),

an inferred tier-1 network topology (Sprint), and a ran-

domly generated network topology (Random). The dis-

tributions are drawn from all micro-loops we have ob-

served in the tests. As shown in the figure, CoRRS’s

flow amplifying factor is ≤ 2. This result shows that

CoRRS prevents packets from being trapped in micro-

loops. In contrast, the vanilla IP forwarding with OSPF,

FRR+OSPF, and FAIL+OSPF all have amplifying fac-

tors up to 64, indicating that packets are finally discarded

due to TTL expirations. Table 5 summarizes the charac-

teristics of micro-loops observed in our simulations. As

can be seen, some loops may last close to half second,

suffiently long to cause voluminous flow amplication.

The combination of FRR and oFIB also prevent loops

in single topology update events, but during two link fail-

ures, they perform similar to vanilla IP forwarding under

OSPF, as oFIB is unable to prevent loops in this situation

and falls back to the default OSPF. Table 5 also shows

the micro-loops formed during oFIB convergence when

there are two link failures.

Figure 9 and Figure 10 show the average packet loss

rates after a link failure and two link failures for each

mechanism in three topologies. Note that the packet loss

rates do not include congestion loss caused by micro-

loops, because we did not simulate real traffic load. There-

fore, the packet loss rates we measure are those caused

11

Page 12: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIBIP+OSPF

(a) Abilene

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CCP+OSPFNotVia+oFIB

IP+OSPF

(b) Sprint

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIBIP+OSPF

(c) Random

Figure 9: The average packet loss rate after a single link failure. X-axis is the time-line. The failure happens at time 0, and is detected

after 200 ∼ 250ms. Y-axis is the packet loss rate for all probing flows that use the failed link.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIBIP+OSPF

(a) Abilene

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CCP+OSPFNotVia+oFIB

IP+OSPF

(b) Sprint

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Pac

ket L

oss

Rat

e

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIBIP+OSPF

(c) Random

Figure 10: The average packet loss rate after two link failures. Other configurations are the same as in Figure 9.

by failed routes, and should be lower in designs that have

micro-loops during routing convergence. The loss rate at

time t is computed by counting how many probe pack-

ets sent during the period [t, t + 10ms] are eventually

received. We average the loss rates over all experiments

for each type of update event.

As can be seen, CoRRS and other fast restoration tech-

niques can all successfully reduce packet loss rates, while

vanilla IP forwarding combined with OSPF have the high-

est loss rate, as packets are discarded until the network

has converged. This is consistent with our emulation

study in the previous section. The packet disruption times

may last about 1 second. In a single failure case, after

the failure is detected (after∼ 200ms), CoRRS and other

fast restoration techniques are able to rapidly bypass fail-

ures using pre-computed paths. In some of the figures,

the packet loss rates do not reach zero because the net-

work is disconnected. In the two link failure case, none

of the techniques have guaranteed recovery. As a result,

some packets are discarded after failures are detected,

especially for the Abilene network. During the conver-

gence period, CoRRS has a slightly higher loss rate than

FFR+oFIB and the vanilla IP forwarding in the Abilene

network. This is because it discards packets potentially

trapped in loops rather than forward them. As shown

in Figure 8(a), the flow amplifying factor in the Abilene

network is less than 64, indicating that the looped pack-

ets are not discarded due to TTL expirations and can still

reach desination after looping for several times. In prac-

tice, packets that loop for dozens of times might have al-

ready congested a link and been discarded due to conges-

tion, but our simulations do not show this effect. Results

for other update events are similar and are not shown for

brevity.

7.3 Convergence Time

We also measure the convergence time for both CoRRS

and FRR+oFIB. oFIB reduces micro-loops during rout-

ing convergence, but it comes at the cost of delaying

routing convergence. A key advantage of our design is

that it does not delay normal routing convergence. Fast

convergence makes the network more resilient and re-

sponsive to changes. For instance, if the network’s con-

vergence time is less than the inter-arrival time of con-

secutive failures, each failure becomes essentially a sin-

gle failure and the network can use pre-computed paths

to rapidly bypass the failure, reducing the packet disrup-

tion times. In addition, fast convergence also reduces the

time a path follows a suboptimal path, e.g., reaching a

failure first before it is rerouted.

Figure 11 shows the network convergence time for

various single topology updates for CoRRS and FRR+oFIB.

FAIL and FRR that use OSPF to converge have the same

convergence time as CoRRS. The convergence time is

measured from the time the first router announces a rout-

ing update to the time that all routers have finished up-

dating their forwarding tables. In multiple topology up-

dates, oFIB falls back to OSPF. Therefore, it has the

same convergence time as OSPF. However, in all other

cases, oFIB converges 2 ∼ 5 times slower than CoRRS,

12

Page 13: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

0

0.5

1

1.5

2

2.5

3

link down link up node down node up 2 links down

Con

verg

ence

Tim

e (s

)

IP+OSPF/CoRRS+OSPFFRR+oFIB

(a) Abilene

0

1

2

3

4

5

6

7

link down link up node down node up 2 links down

Con

verg

ence

Tim

e (s

)

IP+OSPF/CoRRS+OSPFFRR+oFIB

(b) Sprint

0

0.5

1

1.5

2

2.5

3

link down link up node down node up 2 links down

Con

verg

ence

Tim

e (s

)

IP+OSPF/CoRRS+OSPFFRR+oFIB

(c) Random

Figure 11: The averaged convergence time after different network changes. CCP has the same convergence time as OSPF. The error

bars show the standard deviations.

0.9 1

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

0 0.5 1 1.5 2

Pat

h S

tret

ch

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIB

(a) Abilene

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

0 0.5 1 1.5 2 2.5 3

Pat

h S

tret

ch

Time (s)

CCP+OSPFNotVia+oFIB

(b) Sprint

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

0 0.5 1 1.5 2 2.5 3

Pat

h S

tret

ch

Time (s)

CoRRS+OSPFFRR+OSPFFAIL+OSPF

FRR+oFIB

(c) Random

Figure 12: The averaged path stretch after a link failure. X-axis is the time-line. The failure happens at time 0, and is detected after

200-250ms. Y-axis is the packet stretch for all probing packets which previously pass the failure.

because it imposes a strict update order on routers in the

network [14]. This result is consistent with that obtained

in [14]. The convergence time does not include CoRRS’s

alternate path computation time and FRR’s backup path

computation time, which we evaluate separately using

the real implementation as described in § 6.

7.4 Path Stretch

Path stretch is defined as the ratio of the cost of a path

taken by a packet to the shortest path cost in the network.

Figure 12 shows the average path stretch during rout-

ing convergence for a single link failure case. The path

stretch is averaged over all source and destination pairs

whose default forwarding paths incude the failed link.

As can be seen, with CoRRS the path stretch reduces to

1 in less than one second, while FRR+oFIB may take 2

∼ 3 times longer to reduce the average path stretch to

1. Both FAIL and FRR without oFIB have much higher

path stretch during convergence. This is because OSPF

convergence is not loop-free; some packets reach their

destinations by taking extremely long paths after loop-

ing for many times.

7.5 Summary

In summary, our simulation results suggest that com-

pared to other designs, CoRRS prevents micro-loops with-

out slowing or constraining routing convergence. It can

find high-quality loop-free paths with a high probability,

prevents packet loss after the failure is detected, and has

a shorter path stretch during routing convergence.

8. RELATED WORK

CoRRS is motivated by recent work on improving the

availability of Internet routing, including consensus rout-

ing [20] and convergence-free routing [25]. The design

of CoRRS explores a complimentary approach. It aims

not to slow down or constrain routing convergence, while

consensus routing or convergence-free routing sacrifices

the responsivenes of the routing system for consistency.

In this spirit, CoRRS shares similarity with Anormaly-

Cognizant Forwarding (ACF) [11], but this work focuses

on intra-domain routing, and uses the remaining path

cost as a safeguard to detect anomaly. ACF detects for-

warding anomaly caused by inter-domain routing (BGP)

covnergence. A packet carries the entire AS path its has

visited to detect forwarding anomaly.

Deflections [41] and Path slicing [30] provide path

diversity for end systems to improve performance and

bypass failures, while CoRRS aims to enable routers to

rapidly detect forwarding anomalies and repair them dur-

ing routing transitions.

There is also much work in enabling routers to rapidly

forward packets using backup paths after failure detec-

tion, including rBGP, IPFRR, and MPLSFRR [23, 33,

37]. These proposals provide fast failure recovery, but

they alone do not prevent micro-loops. Known loop-

prevention convergence techniques [14, 16, 17] all slow

down routing convergence. CoRRS does not slow down

or constrain routing convergence, and provide valid re-

covery paths with a high probability.

13

Page 14: CoRRS: COnsistent and Responsive Routing with …xwy/publications/929-nsdi09-draft...CoRRS: COnsistent and Responsive Routing with Safeguard Ang Li Dept. of Computer Science Duke University

9. CONCLUSION AND FUTURE WORK

Routing convergence is a main cause for packet loss

and delay on the Internet. Recent work has made much

progress in reducing the adverse effects of routing con-

vergence, but often at the cost of slowed or constrained

routing convergence. This work aims to explore routing

designs that reduce forwarding disruption without sac-

rificing the routing system’s responsiveness to dynamic

changes. Our high-level approach is to enable routers

to detect forwarding anomalies and repair them during

routing convergence. As a first step, we present the de-

sign and implementation of CoRRS, an intra-domain rout-

ing system that ensures consistent forwarding with a high

probability but does not slow down routing convergence.

In the CoRRS design, a packet carries the remaining path

cost in its header, and routers use this information as a

safeguard to detect forwarding anomalies and to discover

valid paths. We also show that carrying limited infor-

mation is necessary for routers to reliably detect all for-

warding anomalies; cost is more effective, flexible, and

lightweight that other information such as a path identi-

fier or failures.

We evaluate CoRRS using both linux-based imple-

mentation and large-scale simulations. Our evaluation

shows that CoRRS is amenable to high-speed implemen-

tation. It achieves 95% of the vanilla IP forwarding through-

put in our prototype implementation. Our simulation re-

sults show that CoRRS is able to prevent micro-loops

and find valid recovery paths during routing transistions

without slowing down routing convergence.

This result suggests that it is promising to achieve con-

sistent and responsive routing with a limited amount of

“safeguard” information in a packet header. It is our

future work to explore how to extend the CoRRS ap-

proach to inter-domain routing that uses policy rather

than cost to select paths. A key challenge there is to

design an efficient and effective safeguard to detect for-

warding anomalies and facilitate recovery.

References[1] BRITE Topology Generator. http://www.cs.bu.edu/brite.

[2] Quagga Routing Suite. http://www.quagga.net.

[3] Reducing Link Failure Detection Time with BFD. http://www.

networkworld.com/community/node/23380.

[4] Scalable Simulation Framework. http://www.ssfnet.org.

[5] SPF Delay Timer. http://www.juniper.net/techpubs/

software/junos/junos74/swconfig74-routing/html/

isis-summary53.html#1036104.

[6] C. Alaettinoglu, V. Jacobson, and H. Yu. Towards Milli-Second IGP Con-

vergence. Internet draft, draft-alaettinoglu-isis-convergence-00.txt, Nov

2000.

[7] C. Boutremans, G. Iannaccone, and C. Diot. Impact of link failures on

VoIP performance. In NOSSDAV, 2002.

[8] S. Bryant and M. Shand. A Framework for Loop-free Convergence. Inter-

net draft, draft-bryant-shand-lf-conv-frmwk-03.txt, Oct 2006.

[9] S. Bryant, M. Shand, and S. Previdi. IP Fast Reroute Using Notvia Ad-

dresses. Internet draft, draft-ietf-rtgwg-ipfrr-notvia-addresses-00.txt, Dec

2006.

[10] R. Callon. Use of OSI IS-IS for Routing in TCP/IP and Dual Environments.

RFC1195, Dec 1990.

[11] A. Ermolinskiy and S. Shenker. Reducing Transient Disconnectivity using

Anomaly-Cognizant Forwarding. In ACM SIGCOMM HotNets VII, 2008.

[12] N. Feamster and H. Balakrishnan. Packet Loss Recovery for Streaming

Video. In International Packet Video Workshop, 2002.

[13] B. Fortz and M. Thorup. Optimizing ospf/is-is weights in a changing world.

IEEE Journal on Selected Areas in Communications, 20(4):756–767, May

2002.

[14] P. Francois and O. Bonaventure. Avoiding transient loops during the con-

vergence of link-state routing protocols. IEEE/ACM Transactions on Net-

working, 15(6):1280–1932, Dec 2007.

[15] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure. Achieving sub-

second IGP convergence in large IP networks. SIGCOMM Comput. Com-

mun. Rev., 35(3):35–44, 2005.

[16] P. Franois, M. Shand, and O. Bonaventure. Disruption-free topology re-

configuration in OSPF Networks. In IEEE INFOCOM, Anchorage, USA,

May 2007. INFOCOM 2007 Best Paper Award.

[17] J. J. Garcia-Luna-Aceves. Loop-free routing using diffusing computations.

IEEE/ACM Trans. Netw., 1(1):130–141, 1993.

[18] U. Hengartner, S. Moon, R. Mortier, and C. Diot. Detection and Analysis

of Routing Loops in Packet Traces. In SIGCOMM IMW, 2002.

[19] G. Iannaccone, C. Chuah, S. Bhattacharyya, and C. Diot. Feasibility of IP

restoration in a tier-1 backbone. IEEE Network Magazine, Jan-Feb 2004.

[20] J. P. John, E. Katz-Bassett, A. Krishnamurthy, T. Anderson, and

A. Venkataramani. Consensus routing: the internet as a distributed system.

In NSDI’08: Proceedings of the 5th USENIX Symposium on Networked

Systems Design and Implementation, pages 351–364, 2008.

[21] D. Katz and D. Ward. Bidirectional Forwarding Detection. Internet draft,

draft-ietf-bfd-base-07.txt, Jan 2008.

[22] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click

modular router. ACM Transactions on Computer Systems, 18:263–297,

2000.

[23] N. Kushman, S. Kandula, D. Katabi, and B. M. Maggs. R-BGP: Staying

connected in a connected world. In NSDI, 2007.

[24] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed internet routing

convergence. In in Proc. ACM SIGCOMM, pages 175–187, 2000.

[25] K. Lakshminarayanan, M. Caesar, M. Rangan, T. Anderson, S. Shenker,

and I. Stoica. Achieving convergence-free routing using failure-carrying

packets. In SIGCOMM, pages 241–252, 2007.

[26] A. Li, X. Yang, and D. Wetherall. Consistent and Responsive Routing with

Safeguard. Technical Report DUKE-CS-TR-2008-04, Duke, 2008.

[27] A. Li, X. Yang, and D. Wetherall. Towards Disruption-Free Intra-Domain

Routing. In SIGCOMM 2008 Student Poster, 2008.

[28] R. Mahajan, N. T. Spring, D. Wetherall, and T. E. Anderson. Inferring link

weights using end-to-end measurements. In Internet Measurement Work-

shop, pages 231–236, 2002.

[29] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and

C. Diot. Characterization of Failures in an IP Backbone Network. In IN-

FOCOM, 2004.

[30] M. Motiwala, N. Feamster, and S. Vempala. Path Splicing: Reliable Con-

nectivity with Rapid Recovery. In ACM SIGCOMM HotNets VI, 2007.

[31] J. Moy. OSPF Version 2. RFC2328, Apr 1998.

[32] S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, and C.-N. Chuah. Fast local

rerouting for handling transient link failures. IEEE/ACM Trans. Netw.,

15(2):359–372, 2007.

[33] P. Pan, G. Swallow, and A. Atlas. Fast Reroute Extensions to RSVP-TE for

LSP Tunnels. RFC4090, May 2005.

[34] Y. Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4). RFC1771,

Mar 1995.

[35] A. Shaikh and A. G. Greenberg. Experience in black-box ospf measure-

ment. In Internet Measurement Workshop, pages 113–125, 2001.

[36] M. Shand and S. Bryant. IP Fast Reroute Framework. Internet draft, draft-

ietf-rtgwg-ipfrr-framework-07.txt, Jun 2007.

[37] M. Shand and S. Bryant. IP Fast Reroute Framework. Internet draft, draft-

ietf-rtgwg-ipfrr-framework-08.txt, Feb. 2008.

[38] N. T. Spring, R. Mahajan, D. Wetherall, and T. E. Anderson. Measuring

ISP topologies with rocketfuel. IEEE/ACM Trans. Netw., 12(1):2–16, 2004.

[39] J.-P. Vasseur, M. Pickavet, and P. Demeester. Network Recovery: Protec-

tion and Restoration of Optical, SONET-SDH, and MPLS. Morgan Kauf-

mann, 2004.

[40] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold,

M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environ-

ment for distributed systems and networks. In OSDI02, pages 255–270,

Boston, MA, Dec 2002.

[41] X. Yang and D. Wetherall. Source selectable path diversity via routing

deflections. In SIGCOMM, pages 159–170, 2006.

14