corrs: consistent and responsive routing with …xwy/publications/929-nsdi09-draft...corrs:...
TRANSCRIPT
CoRRS: COnsistent and Responsive Routing with Safeguard
Ang LiDept. of Computer Science
Duke [email protected]
Xiaowei YangDept. of Computer Science
Duke [email protected]
David WetherallIntel Research Seattle &University of Washington
ABSTRACT
The Internet uses a process called routing convergence
to quickly find suitable forwarding paths after dynamic
network changes such as failures. During routing con-
vergence, routers may have inconsistent forwarding state
that leads to massive packet loss or forwarding loops.
Real-time applications are often sensitive to loss and de-
lay, and may produce noticeable performance degrada-
tion. Recent work on reducing the adverse effects of
routing convergence all slows down or eliminates routing
convergence. This paper presents the design and eval-
uation of an intra-domain routing system CoRRS that
reduces forwarding disruptions during routing conver-
gence without scarificing the network’s responsiveness
to dynamic changes. CoRRS uses the remaing path cost
informatin carried by packets to detect forwarding anoma-
lies and to discover valid paths. Our evaluation based
on a linux implemention suggests that CoRRS is suit-
able for implementation on high-speed routers. It has a
small and fixed header overhead, and achieves 95% of
the vanilla IP packet forwarding throughput. Our simu-
lation comparison shows that CoRRS is more effective
and converges 2∼5 times faster than other designs.
1. INTRODUCTION
Real-time applications such as VoIP, online gaming,
video conferencing, and IPTV desire non-interrupted net-
work services [3, 7]. Even sub-second periods of packet
loss or delay may adversely impact the users of these
applications [7, 12]. Unfortunately, the present Inter-
net routing system can easily produce noticeable periods
of disruption following a change in the network, e.g.,
the failure or restoration of equipment, an ISP policy
change, or a traffic engineering route adaptation.
When a network change occurs (e.g., a link goes down),
routers must be informed and recompute their forward-
ing tables to adapt to the change, a process known as
routing convergence. During convergence, routers adja-
cent to failures may not have valid routes, or routers may
have inconsistent next hop sequences that form tempo-
rary loops called as micro-loops. As a result, packets
may be discarded or trapped in a loop. Recent mea-
surement shows that BGP convergence may cause 30%
packet loss for two minutes or longer [24], and micro-
loops formed during IGP convergence in a tier-1 network
account for 90% of the total packet loss [18].
Recent work on reducing forwarding disruption dur-
ing routing convergence [6, 11, 14, 15, 16, 20, 23, 25, 32,
37] has mainly focused on two general approaches. One
approach is to speed up the routing convergence process
by improving the underlying routing protocols [6, 15].
The other is to design new convergence techniques to
mitigate the adverse effects of routing convergence [14,
16, 17, 20, 25]. Unfortunately, fundamental limits such
as speed of light and memory update latency [19, 36, 39]
have suggested that the convergence period could not be
arbitrarily shortened. Even in an intra-domain environ-
ment, recent work suggests that the routing transition
time could be shortened to sub-second periods, but no
less. This result still falls short to meet the requirements
of real-time applications [7].
The second approach aims to ensure a consistent for-
warding state among routers to avoid loops and black-
holes, but at the cost of reducing the responsiveness of
the routing system. One example is consensus routing [20],
in which routers first agree upon the set of routing up-
dates that are safe to apply by taking periodic global
snapshots and running a consensus algorithm before they
update their forwarding tables. The routing system does
not respond to a network change immediately. Another
example is convergence-free routing with Failure-Carrying
Packets (FCP) [25]. FCP eliminates the routing conver-
gence process, and uses a logical map distributor to pe-
riodically distribute the static network map to routers.
Transient failures do not trigger routing updates.
This work explores a complimentary approach: can
we reduce forwarding disruption without slowing down
or constraining routing convergence? Our insight is that
routing convergence, although costly, is advantageous.
It enables the network to promptly respond to dynamic
changes such as failures or traffic load variation and re-
compute optimal routes. For instance, with routing con-
vergence, packets need not hit failures first and then take
a detour to reach their destinations. In addition, routing
convergence also enables a simple form of intra-domain
traffic engineering, in which congested links are avoided
by adjusting link weights and announcing them in rout-
ing updates [13]. Therefore, it is desirable for the routing
system to quickly converge after dynamic changes.
We propose a detect-then-rescue paradigm to simul-
1
taneously achieve consistency and responsiveness. That
is, routers detect packets that are potentially “in danger”
(i.e., trapped in a loop or destined to a blackhole route)
during routing convergence, and then use a transient-
mode forwarding mechanism to rescue those packets.
This approach shares a similar spirit as concurrent work
on Anomaly-Cognizant Forwarding [11], but as we will
soon describe, it differs in both mechanism and focus. 1
A key challenge faced by the design is to enable routers
to detect abnormal forwarding state with low overhead
and high accuracy. Ideally, to be practical we wish to
detect all anomalies without adding extra information in
a packet header or increasing a router’s memory, pro-
cessing, or signaling overhead. Unfortunately, we show
in § 2.2 that even in a basic shortest-path based routing
system, it is impossible to detect forwarding loops accu-
rately without adding additional packet header overhead
or resorting to TTL expirations under generic network
conditions.
Motivated by this negative result, we consider designs
that let packets carry limited safeguard information for
routers to detect abnormal forwarding state. As a first
step, we present the novel design (§ 3-§ 4) of an intra-
domain routing system CoRRS that uses the remaining
path cost as a safeguard. We show that by letting packets
carry a fixed-length label that encodes their remaining
path cost, we can design a routing system that enables
routers to detect all inconsistent forwarding paths that
may lead to micro-loops or blackholes. Furthermore,
the cost label can be used to discover valid repair paths,
thereby achieving two goals: detection and rescue, with
one piece of packet state.
We strive to make the design amenable for implemen-
tation in high speed routers. It does not modify existing
routing protocols nor introduce new distributed conver-
gence protocols, unlike alternatives [14, 16, 17, 20, 25].
It adds packet header overhead, but this overhead is fixed
and small, unlike [11]. Packet forwarding is modified
but involves at most two table lookups. As our imple-
mentation shows (§ 6), a software implementation using
Click [22] can achieve XX% of the vanilla IP forward-
ing throughput. The additional memory and computation
overhead added by our design is comparable to other so-
lutions [9] that is less effective in terms of reducing for-
warding disruption.
We also compare our design with other alternatives
using simulations on real, inferred, and randomly gen-
erated topologies. We simulate topology changes and
observe the packet forwarding behavior during routing
transitions. Without CoRRS, we observe that on aver-
age 18% of the routing transitions cause micro-loops and
some loops can last up to half a second. Packets can be
trapped in a loop, traversing a uni-directional link tens
1An earlier version of this work has appeared in [27].
of times until their TTLs expire. In contrast, by carry-
ing cost as the safeguard information, we observe that
no packets are trapped in loops without delaying routing
convergence. The routing transition period in CoRRS is
2 ∼ 5 times shorter than a state-of-art loop prevention
convergence protocol [14].
We conclude and discuss future work on extending the
CoRRS approach to a policy-based inter-domain routing
environment in § 9. The rest of the paper describes the
design and evaluation of CoRRS in detail.
2. OVERVIEW
In this section, we motivate the design choice of using
the remaining path cost in a packet as a safeguard. We
first define the design goals, and then present a negative
result that shows routers are unable to detect forwarding
loops during routing transitions without altering packet
headers or excessive router state. We then explain why
carrying the remaing path cost may achieve our goals.
2.1 Goals
We aim to explore a detect-then-rescue approach to
achieve consistent and responsive routing. By consistent,
we mean that each hop makes forward progress. When a
routing system is in a stable and converged state, a packet
should be forwarded “closer” to its destination by certain
metric at each hop. By responsive, we mean that when
a network element changes its status (e.g., a node or link
failure or restoration), the routing system can quickly
learn the event and re-compute its routes to adapt to the
change. This adaptation is desirable because after con-
vergence, packets can be forwarded along the best routes
the network can provide. On the other hand, the conver-
gence process involves signifcant overhead. When the
rate of changes exceeds a threshold, the routing system
may not keep up with the changes. Consequently, prac-
tical routing systems [10, 31, 34] all have a damping
mechanism to reduce the reponsiveness of the routing
system in case of excessive network changes. Our de-
sign aims not to further slow down or constrain routing
convergence beyond the built-in damping mechansims.
This work focuses on achieving consistent and respon-
sive routing in a typical intra-domain routing environ-
ment in which routers run a link-state shortest-path based
routing protocol such as OSPF or IS-IS. We consider it
as a first step towards a global consistent and highly re-
sponsive routing system.
Our tasks are two-fold: detecting and repairing for-
warding anomalies. An anomaly generally refers to the
violation of the consistency requirement when a router
forwards the packet to the next-hop. Our design aims
to detect all forwarding anomalies, because undetected
anomalies may lead to undesirable effects such as black-
holes and forwarding loops. Loops are especially harm-
ful, because they not only affect packets trapped in the
2
Figure 1: This counterexample shows that routers can not detect
forwarding loops without extra information. All links are asym-
metric with one direction (the one shown in the figure) having a
much lower cost than the opposite direction.
loop, but also inflict collateral damage on packets that
share the links on the loop. For instance, if a video
stream of 1Mbps is trapped in a loop for 10 times, its
peak load on a link would become 10Mbps. Packets that
would have reached their destinations may be discarded
due to congestion.
After anomalies are detected, it is also desirable to for-
ward packets to their destinations with guaranteed suc-
cess (as long as the network is not partitioned) under all
convergence scenarios. However, we find this goal diffi-
cult to achieve, because the patterns of network changes
are unpredictable. It would require on-demand computa-
tion [25] for routers to find a valid path under all conver-
gence scenarios. To be practical and efficient, our design
only aims to provide guaranteed recovery for the most
common type of event [29]: a single non-overlapping
status change (e.g., failure, restoration, or weight adjust-
ment) of a link, node, or shared risk link group (SRLG).
In a less common case that independent network ele-
ments change their status concurrently, we consider it
practically acceptable that the design finds a valid path
with a high likelyhood. This is also the design require-
ment advertised by IETF’s IP fast reroute framework [37].
We note that detecting all anomalies under generic sce-
narios is still useful, because it enables a rescue mechan-
sim that is likely to succeed to kick in.
2.2 The Impossibility Result
Our first attempt to enable anomaly detection is to
consider designs that do not alter the IP packet header,
add little or no router state and processing overhead, and
do not introduce new control messages. These constraints
are desirable because they make a design amenable to
practical deployment. Unfortunately, we discover that
although it is possible to detect anomaly while satisfying
the above contraints under specific conditions, including
symmetric networks and single topology change, it is im-
possible for routers to detect forwarding loops without
relying on TTL expirations under generic settings.
We prove this negative result using a counterexample
shown in Figure 1. The high-level idea underlying the
proof is to show that a forwarding loop may form dur-
ing a routing transition period while each router on the
loop observes normal forwarding behavior according to
its local view of the network. With the constraint of not
adding extra overhead, we assume that the only avail-
able information for a router to discover an anomaly is
a packet p’s header and content, the packet’s incoming
and outgoing interfaces, and a router’s local forwarding
tables. We do not assume that a router can “remember” a
packet that it has seen before, because we do not assume
that a router keeps additional per-packet state.
A main forwarding anomaly that we aim to detect is
micro-loops. To detect this anomaly, some router must
be able to tell the difference between a packet that is
trapped in a loop from a packet that is not. Without
keeping additional per-packet state, a router can only tell
the difference from information associated with a packet,
i.e., the packet itself p, its incoming p.i interface, and its
outgoing interface p.o.
Unfortunately, under certain scenarios, a packet p that
is trapped in a micro-loop may appear the same as a
normal packet p′ to every router on the loop accord-
ing to each router’s current view of the network, i.e.,
(p, p.i, p.o) ≡ (p′, p′.i, p′.o) Figure 1 shows the coun-
terexample. All links in the network are asymmetric.
One direction (the one shown in the figure) has a much
lower cost than the opposite direction. Consider the paths
between the node S and D. Before the two links (S →
X and Y → D) fail, there are two equal-cost paths from
S to D: S→X→Z→W→Y→D and S→W→Y→D.
Packets from S to D can be forwarded along either path.
After the two links fail and all nodes have re-computed
their forwarding tables, the new shortest path would have
become S→W→Y→X→Z→D. Now suppose that during
the routing transition period, the shaded nodes: S, X ,
and Y have updated their forwarding tables, while the
other nodes on the new path (W and Z) have not updated
to the new forwarding tables. In this situation, a packet
p sent from S to D will be trapped in a forwarding loop
S→W→Y→X→Z→W→Y→X · · · .
However, were the routing system not in transition,
each node on the loop might receive a packet p′ such
that (p, p.i, p.o) ≡ (p′, p′.i, p′.o). For instance, for node
W , a packet p trapped in the loop would have an incom-
ing interface Z , and an outgoing interface hop Y . From
the node W ’s current view, the two links did not fail. In
this case a packet p′ sent from S to D can have the in-
coming interface Z and also the outgoing interface Y .
Since a packet can contain arbitrary content, p and p′
may have the same content (including the TTL values)
when they arrive at W . If the node W detects an anomaly
when it forwards the packet p, it will also consider p′ as
an anomaly. This is a contradiction, because before the
links failed, p′ is a normal packet that will reach its des-
tination. Similarly, one can verify that the same is true
for other nodes involved in the loop.
3
Figure 2: Forwarding using cost-carrying packets on the Abi-
lene network.
Although the above example is extremely specific, it
is sufficient to prove a negative result, because if the hy-
pothesis of detecting anomaly without extra information
were true, it must be true for all scenarios.
2.3 Carrying Path Cost as a Safeguard
The above negative result motivates us to consider de-
signs that include limited information in packet headers.
The remaining path cost becomes a natural choice, as
cost is the metric that routers use to choose forwarding
paths. During normal forwarding, a node ni will only
forward a packet to a next-hop node ni+1 if ni considers
ni+1 as closer to the destination than itself. Additionally,
if in ni’s current view, the remaining path cost from ni+1
is different from that in ni+1’s current view, ni and ni+1
must have inconsistent views of the network, indicating
that the routing system is in transition.
In our design, packets carry the remaining path cost
to their destinations. Routers compare their locally com-
puted costs with the costs carried in packets, and detect
an anomaly if they are different. Figure 2 shows an ex-
ample using the Abilene network topology. Suppose the
link between Denver and Kansas City fails. The Den-
ver node has updated its forwarding tables to use the
Sunnyvale node to reach Kansas City. Unfortunately,
the Sunnyvale node has not updated its forwarding ta-
ble. Without carrying the path cost, packets will loop be-
tween Sunnyvale and Denver, until the Sunnyvale node
updates its forwarding tables.
If packets carry the remaining path cost, the Sunny-
vale node will stamp the remaining path cost to Kansas
City (639) into the packets it forwards to Denver. As
the Denver node has updated its forwarding tables to by-
pass the failed link to Kansas City, its local path cost
(4456) will be higher. The Denver node detects a cost
inconsistency, and can then invoke a rescue mechanism
to prevent the forwarding loop between Sunnyvale and
Denver.
2.4 Using Path Cost to Discover Valid Paths
The remaining path cost has another attractive fea-
ture that it can be used as a hint for router to discover
valid paths without additional packet header overhead,
because different paths tend to have different costs. When
a router detects an anomaly using the cost carried by a
packet, the cost value also provides a hint on the valid
path the router could use to forward the packet. For
instance, in Figure 2, if the Sunnyvale node receives a
packet destined to Kansas City with a cost 3161 from the
Denver node, it will notice that the packet cost is higher
than its local cost. So it is likely that a downstream node
on its default path has discovered a failure that it has
not learned. If the Sunnyvale node has pre-computed
an alternative path (Sunnyvale→Los Angeles→Houston
→Kansas City) that has the same cost 3161, it indicates
that the downstream node that has discovered the failure
consider this alternative path valid. The Sunnyvale node
can safely forward the packet along this path to bypass
the downstream failure that it has not learned.
The remaining path cost has a limitation that it can not
distinguish multiple paths with equal cost. When mul-
tiple alternative paths have the same cost, a node may
choose an invalid path that goes back to a failure. We
address this issue in Section 4.
2.5 Comparing Path Cost with Other Choices
Packets may carry other information for anomaly de-
tection and recovery. We have considered a number of
choices, including topology or path identifiers, the en-
tire path visited by a packet, failures encountered by a
packet, or the hop count. We opt for cost because it en-
ables early detection and optimal repair. That is, pack-
ets do not need to hit failures to be rerouted, and a repair
path is often the shortest path after the network has con-
verged. Moreover, as we soon explain, it is more flex-
ible than carrying a path or a topology identifier, more
lightweight than carrying a path, more effective than car-
rying failures, a path or a topology identifier, or a hop-
count. We discuss these advantages as follows.
Failure: A packet may carry the failures it has encoun-
tered. This requires a variable-length header, and pack-
ets can not be rerouted before they reach a failure. For
instance, in Figure 2, packets sent from Los Angeles to
Kansas City will need to hit Denver first before they are
rerouted, taking the detour (Los Angeles→Sunnyvale→
Denver→Sunnyvale→Los Angeles→Houston→Kansas
City) to reach its destination. In contrast, if packets carry
the remaining path cost, the first updated node is able to
reroute packets towards their destinations along the new
shortest paths. For instance, if the Sunnyvale node has
updated while Los Angeles not, the packet will take a
shorter path (Los Angeles→→Sunnyvale→Los Angeles→
Houston→Kansas City) to reach its destination.
Moreover, carrying failures does not prevent forward-
ing loops during routing convergence, because different
routers may still see different network topologies during
4
routing transitions.
Path: If a packet records the nodes it has traversed,
routers can avoid forwarding the packet back to where
it is from, thereby avoiding forwarding loops. How-
ever, this option requires a variable-length packet header,
which is less efficient than using a fixed-length label that
encodes cost.
Path identifier: A packet may carry a remaining path
identifier that specifies the remaing path a packet takes.
A router can discover that its local path identifier is dif-
ferent from the one in a packet, but it cannot learn from
the packet path identifier what may cause the inconsis-
tency, thereby choosing a valid path. In contrast, a router
can compare the cost values. If the one in a packet is
higher, it indicates an unknow failure on a router’s de-
fault path. The router can then follow the path suggested
by the path cost in the packet.
In addition, a path identifier prevents a downstream
node from changing a packet’s forwarding path. This
flexibility is desirable because a downstream node can
dynamically split traffic over multiple equal-cost paths
based on load of its links to mitigate congestion. More-
over, setting up path identifiers require additional sig-
naling or computation overhead, while cost values are
readily available from shortest-path computation.
Topology identifier: A packet may carry an identifier
that specifies the network topology an upstream router
uses to compute the forwarding path. A router can signal
a forwarding anomaly if the packet’s topology identifier
is different from its local topology identifier. However,
inconsistent topologies do not necessarily mean incon-
sistent paths. Signaling on inconsistent topologies intro-
duces false positives, and may waste network resources
by triggering unnecessary recovery. Morever, similar to
a path identifier, a router learns little information on the
valid path to follow from a topology identifier.
Hop Count: A packet may carry its remaining hop
count to a destination. The hop count metric is similar
to cost because both measure a packet’s distance to its
destination. However, practical routing protocols such
as OSPF and IS-IS use fine-grained cost metrics to com-
pute paths. Consequently, hop count is a weaker hint for
choosing a repair path once a forwarding anomaly is de-
tected. More importantly, it is incompatible with equal-
cost multi-paths (ECMPs) forwarding, because paths with
equal cost may have different hop counts.
3. BASIC DESIGN
In this section, we describe the basic design of using
the remaining path cost to detect forwarding anomalies
and to discover valid alternative paths. For ease of ex-
position, we assume that different paths have different
costs for the moment, and discuss the problem caused
by equal-cost paths and how we address it in § 4. We
refer to a node or link failure (or cost increment) event
as a Down event, and restoration (or cost decrement) as
an Up event.
3.1 Header Format
In our design, a packet carries a short label (32-bit
in our implementation) that encodes the remaining path
cost to its destination. A packet header also has a counter
that routers use to record the total number of anomalies
they detect. The length of counter determines how many
times routers will attempt to rescue a packet. Our design
uses a 1-bit counter for simplicity. When the counter is
zero (one), the packet is in the normal (1-anomaly) for-
warding mode.
3.2 Alternative Path Database
To successfully recover from a forwarding anomaly, a
router needs to pre-compute alternative paths by antici-
pating a future failure on its current network map G. If
a router anticipates a network element e (a link, node,
or SRLG) may fail in the future, it computes its short-
est paths to reach a destination without the element e in
its current network map. The router stores the alternative
paths’ next hops and costs in an alternative path database
(APD). Conceptually, an APD is a table that stores the
mapping from the pair (dst, cost) to a valid next hop.
During a routing transition, if a router’s local path cost
differs from a packet cost, the router may attempt to find
an alternative path that matches the cost on the packet,
and forward the packet along that path.
A router may re-compute this APD whenever it re-
ceives a routing update that changes its current network
map. Since the updated APD is only used to when the
next topology change occurs, the computation is not ur-
gent, and could be done in low priority after a router has
computed its forwarding entries. In the case that a topol-
ogy update results in a larger or better topology, e.g., a
link up, a router can save some computation by swap-
ping the next hops and path costs in its previous forward-
ing entries to its APD before it updates those entries, as
those entries would be the alternative path entries com-
puted without the newly added topology element.
3.3 Forwarding
We now describe how a router uses the cost on a packet
to prevent loops and resolve forwarding inconsistency
during routing transitions. When a packet that carries
its remaining path cost pkt.cost arrives at a router ni,
the router compares its local cost ni.cost with pkt.cost.
Dependent on the comparison result, the router ni takes
different forwarding actions and increases the anomaly
counter correspondingly. A packet enters the network in
the normal mode. The comparison has three outcomes:
Normal (ni.cost ≡ pkt.cost): This indicates that the
5
router ni and its upstream router have consistent for-
warding paths. The router updates the packet’s cost pkt.cost
by subtracting its link cost to the next hop ni+1: pkt.cost
← pkt.cost − costni→ni+1, and forwards the packet to
its next hop ni+1.
Cost Increase Inconsistency (ni.cost > pkt.cost): This
inconsistency shows that a router’s local path cost is higher
than its upstream router’s cost. The network must be
in routing transition, as the router ni has computed dif-
ferent paths from other routers. If the packet is in the
normal mode, then this is the first cost inconsistency the
packet encounters. The router will attempt to resolve the
path inconsistency.
In the cost increase case, the router ni resolves the
inconsistency by forwarding the packet along its default
path. Its default path must be valid because ni must have
a smaller topology than other routers, as it has a higher
path cost. If it is a Down event, ni must have already
updated its forwarding table according to the event, and
its path will bypass the failed component. If it is an Up
event, ni must have not updated its forwarding table, and
its path will not use the newly restored or added compo-
nent, but can still reach the destination.
So the router ni updates the packet cost using its local
cost: pkt.cost← ni.cost − costni→ni+1, increases the
anomaly counter to 1-anomaly, and forwards the packet
to its next hop ni+1.
A special case of cost increase inconsistency occurs
when a packet reaches a router adjacent to a failure, but
the router has not updated its forwarding tables. The
router’s next hop is invalid and the local path cost at the
router is infinity. This situation may happen right after
a router detects a failure, or when a router is damping a
flappy link.
In this case, a router ni will immediately start using
the alternative path pre-computed to bypass its next hop
failure to forward and update a packet cost. The router
will first try to use the alternative path that bypasses its
next hop node. If such a path is unavailable, e.g., the next
hop is the destination, the router uses the alternative path
computed by removing the link to reach the next hop.
Let the alternative path’s next hop be n′
i+1, and the path
cost be ni.cost′. The router ni increases the anomaly
counter to 1-anomaly, updates its cost to be: pkt.cost←
ni.cost′ − costni→n′
i+1.
If a packet is already in 1-anomaly mode and the router
detects a cost increase inconsistency, it indicates that the
rescue effort by a previous router fails, which may occur
during multiple topology updates. A router may either
demote the packet to low priority so that it will not com-
pete for traffic with normal or 1-anomaly traffic, or dis-
card it. Our design discards packets that encounter mul-
tiple anomalies for simplicity, because such events are
rare and may occur only in multiple independent topol-
Figure 3: There are two ECMPs from a to c: a→b→c and a→d→c.
ogy update events that are not prepared.
Cost Decrease Inconsistency (ni.cost < pkt.cost): This
inconsistency shows that a router’s local cost is lower
than its upstream router. Again, the network must be in
a transition. As the router has a lower cost, it must have
a larger topology. It is no longer safe to forward along
the router’s default next hop, because it may lead to a
failure.
To resolve a cost decrease inconsistency, a router uses
the packet cost pkt.cost to look up an alternative path
in its APD, because its APD is computed using smaller
topologies than its current one, and a higher cost path
may be found in the APD. Suppose this lookup returns
a next hop n′
i+1. The router ni updates the packet cost
using the link cost to reach n′
i+1: pkt.cost← pkt.cost−
costni→n′
i+1, increases the anomaly counter to 1-anomaly,
and forwards the packet to n′
i+1.
The router may not find the packet cost in its alter-
native path database, which may happen when there are
multiple topology updates, e.g, multiple independent links
up simultaneously. In this case, the path cost cannot be
used to locate a valid path, and any further forwarding
may risk forming a loop. Similarly, the router may ei-
ther demote the packet or discard it. Our design chooses
discard for simplicity.
4. HANDLING EQUAL-COST PATHS
The basic design faces a significant challenge: although
path costs are highly likely to be distinctive among dif-
ferent paths, if equal cost multi-paths (ECMPs) do exist
between a pair of nodes, a path cost may not uniquely
identify a consistent forwarding path. Loops may form
after packets are repaired, even though the detection of
inconsistency is always effective.
Figure 3 shows an example. The node a has two ECMPs
to reach c: a→b→c and a→d→c. Suppose the link d→c
fails. The node d will use the alternative path to reach c:
d→a→b→c. It stamps a path cost 2 in the packet, and
forwards it to the node a. Unfortunately, the node a has
not updated its forwarding tables, and still thinks both its
ECMPs are valid as no cost inconsistency is detected. If
it picks the path a→d→c, a micro-loop forms.
4.1 Adding Noise to Link Cost
A key insight behind our solution to the ECMP prob-
lem is that equal cost multi-paths are primarily used for
load-balancing or failure backup in practice. A down-
stream router has the flexibility of deflecting packets to
6
Figure 4: A new version of the topology shown in Figure 3 with
salted costs. Two ECMPs between a and c now have distinct salted
costs.
any next hop that has the same cost to the destinations.
We think that this flexibility is less critical than safely
forwarding packets to their destinations during a tran-
sient routing convergence period. As this period only
lasts temporarily, it may be acceptable to let a down-
stream router follow the path an upstream router chooses
to prevent forwarding loops.
Our design embeds a random noise in the lower k bits
of a link cost to distinguish equal-cost alternative paths
an upstream router chooses. A router ignores the lower
k bits of the cost for packets in the normal forwarding
mode, but uses those k bits to differentiate alternative
paths for packets in the 1-anomaly mode. Different al-
ternative paths will have distinct costs with a high prob-
ability, but routers can still use ECMPs during normal
forwarding. We describe this mechanism in more detail.
4.2 Computing Salted Path Cost
A link cost now has two parts: a normal link cost and
a random noise. The normal link cost is what a network
would configure a link cost to be without considering
failure recovery issues. We refer to the cost with a noise
as a salted cost. A salted link cost can be viewed as a
pair of two values: (cost, noise), and is denoted as cost.
A noise is randomly distributed between [0, 2k − 1].A router uses the salted link costs to compute its APD
as described in § 3.2. That is, it removes an element e
from its current network map, and uses the shortest path
algorithm and the salted link costs to compute the alter-
native paths to reach a destination. During the shortest
path computation, to avoid carrying over any bit from
the noise to a normal path cost, a router adds each part
separately: cost1 + cost2 = (cost1 + cost2, noise1 +noise2). Similarly, salted costs are compared lexico-
graphically: cost1 > cost2, if cost1 > cost2 or cost1 ≡
cost2, and noise1 > noise2. These addition and com-
parison rules guarantee that the lowest salted cost path
must be one of the original ECMPs. When a router stores
a salted path cost, it only stores k-bits of the noise path
cost modulo 2k: cost ← (cost, noise mod 2k). This
encoding rule ensures that normal ECMPs’ salted costs
only differ in the last k bits.
A router ni computes its normal forwarding paths only
using the normal link cost portion of a salted cost. Its
normal forwarding paths are not affected by the noise
cost. In addition, for each destination d, a router also
computes its lowest salted cost path and the lowest salted
cost paths for each of its ECMP next hop. This is be-
cause during routing transitions, after a packet is in the
1-anomaly mode, it must carry a salted path cost that
uniquely identifies the alternative path, i.e., the lowest
salted path cost, at every hop. A router may need to com-
pare with this cost, or stamp this cost when it forwards a
packet to its next hop. These lowest salted cost paths can
be computed on a router’s shortest path forwarding trees
using the salted link costs. After the computation, for
each ECMP next hop ni+1, a router stores the lowest re-
maining salted path cost for each next hop:ni.costni+1(d),
in its forwarding tables. A router also stores its own low-
est salted path cost ni.cost(d).
4.3 Modified Forwarding Algorithm
In the modified design, a packet carries a salted re-
maining path cost pkt.cost. A router ni compares this
cost with its lowest salted path cost ni.cost in its for-
warding tables and take different actions based on the
outcomes.
Normal Mode Packets: If the packet is in the nor-
mal mode, then the router ni compares the normal path
costs only, i.e., ignoring the lower k bits of its own salted
path cost and in the packet cost. We use ni.cost and
pkt.cost to denote the normal path cost, and ni.noise
and pkt.noise to denote the noise portion. If the nor-
mal path costs are equal: pkt.cost ≡ ni.cost, the router
picks a next hop ni+1 according to its local load bal-
ancing algorithm, and stamps the remaining salted path
cost of that next hop ni.costni+1into the packet, and for-
wards the packet to the next hop ni+1.
If the router’s normal path cost is higher, ni.cost >
pkt.cost, a cost increase inconsistency occurs. Similar
as before, this indicates that a router has a smaller topol-
ogy, and it can use its shortest path to keep forwarding.
The router may have different ECMPs. After increasing
the packet’s anomaly counter to 1-anomaly, it can use
any ECMP next hop and stamps the lowest salted cost of
that next hop to forward the packet.
If the router’s normal path cost is lower, ni.cost <
pkt.cost, then a cost decrease inconsistency occurs. Sim-
ilarly, the router must have a larger topology, and it will
attempt to find an alternative path in its APD. The router
will use the salted path cost in the packet pkt.cost to
look up the alternative path. Suppose the router finds an
alternative path with the same salted cost pkt.cost, and
the next hop is n′
i+1. The router increases the anomaly
counter to 1-anomaly, updates its packet cost: pkt.cost
← (pkt.cost - costni→n′
i+1, (pkt.noise - noiseni→n′
i+1)
mod 2k), and forwards the packet to the next hop n′
i+1.
If a router cannot find a path with the same salted cost
pkt.cost which may happen during multiple concurrent
network changes, it discards the packet.
7
k Collision Probability
10 0.0097
16 0.00015
24 6.0 × 10−7
32 2.3 × 10−9
Table 1: The probability of having two equal salted cost paths
between two nodes when there are c = 5 normal ECMPs, given
different noise length k.
1-anomaly Mode Packets: When a packet has al-
ready encountered an anomaly, a router ni compares the
packet’s salted cost pkt.cost with its own lowest salted
path cost ni.cost. This is because some of the router’s
ECMPs may already have failed. The router needs to
use the salted cost to uniquely identify a valid forward-
ing path. If pkt.cost ≡ ni.cost, the router forwards the
packet to the next hop ni+1 that has the lowest salted
cost, updates the packet cost to be the remaining salted
cost: pkt.cost ← ni.costni+1, and forwards the packet
to the next hop ni+1.
If ni.cost 6= pkt.cost, then the router looks up the
salted packet cost pkt.cost in its alternative path database.
If a match is found, it forwards the packet and updates its
path cost as described before. Otherwise, it discards the
packet.
We revisit the example in Figure 3 to show why the
salted cost and the modified algorithm address the ECMP
issue. Figure 4 shows the same topology with salted link
costs. The two ECMPs a has now have distinct salted
costs: (2, 13) and (2, 17). Suppose a has not updated
its forwarding tables after the link d → c has failed. It
forwards a packet to d and stamps the lowest remaining
salted cost (1, 7). The node d has updated, and detects
a cost increase inconsistency. It marks the packet in 1-
anomaly, forwards the packet via its own next hop, which
is a, and stamps the lowest remaining salted path cost
(2, 13). When a receives this packet, it uses the salted
path cost (2, 13) to unambiguously choose the correct
alternative path next hop b, instead of sending it back to
d, as the salted path cost does not match and the packet
is already in 1-anomaly mode.
4.4 Cost Collision Analysis and Prevention
The random noise added to a link cost makes two nor-
mal ECMPs have distinct salted path costs with a high
probability. We analyze this probability, and discuss how
to reduce the probability of collision to be negligible.
A noise cost is chosen randomly between [0, 2k − 1].The sum of noise costs modulo 2k is also randomly dis-
tributed. If a node has c normal ECMPs or c repair paths
with the same normal costs, the probability that no two
such paths have the same salted path cost is:
1 · (1−1
2k)...(1 −
c− 1
2k) =
2k!
2ck(2k − c)!
Table 1 shows the probability of collision when c = 5
for various values of k. In practice, c is typically small
(< 5) 2, because two backup paths usually suffice.
As can be seen, with practical values of c and k, the
probability of collision is low. In case a collision does
occur, a network administrator can try different noise
values until a collision-free noise configuration is found.
This is doable because noise values do not affect normal
forwarding operations, and the probability of collision
is small. Our simulations use a 10-bit noise value, and
we do not run into any collisions on all simulated topolo-
gies, including an inferred tier-1 topology. Therefore, we
think that the probability of collision can be practically
ignored if we use a 16-bit or longer noise.
5. PROPERTIES
We now briefly describe the forwarding properties of
our design. We omit formal proofs due to the lack of
space. When stating those properties, we do not consider
congestion loss, because it is not caused by violations of
forwarding consistency. We also ignore the failure detec-
tion period during which routers may forward packets to
a failed link without noticing the failure, and the router
initialization period during which a newly added router
has not obtained any topology information.
Property 1 Packets will follow normal shortest paths, in-
cluding ECMPs to reach their destinations in the normal
mode when the network is in a stable state.
This property holds because routers compute their nor-
mal forwarding paths in the same way as when link costs
do not have noises. Our algorithm compares the normal
path cost in a packet and a router’s local normal cost first
for packets in the normal mode. In a steady state, these
two cost values will always match and packets will reach
their destinations without triggering any inconsistency.
Property 2 If different paths have distinct salted path
costs, then during the routing transition period in which
only one network element changes its status and the net-
work is not partitioned, a packet will be forwarded to its
destination in either the normal or 1-anomaly mode.
This property holds because when there is only one
element changing its status, a router always has a valid
path in either its normal forwarding tables or its ARD,
and if a packet is destined to a failure or trapped in a
loop, a router is able to detect a cost inconsistency and
forward the packet along an alterative valid path.
Property 3 A packet will not be trapped in a micro-loop
without being discarded.
By trapped in a micro-loop, we mean that if all routers
stop updating their forwarding tables after forming a loop,
a packet will not escape the loop until its TTL expires.
This property holds because a packet cannot traverse a
node twice without a cost inconsistency, and after two
2This observation is based on five real ISP topologies.
8
Figure 5: The structure of the prototype implementation
Figure 6: The encoding scheme of cost. We overload the grey
area in IP header to carry the 32-bits cost.
cost inconsistencies or a router fails to find an alternative
path, the packet will be discarded.
6. IMPLEMENTATION
To understand the overhead and complexity of CoRRS,
we implement our design using the Quagga routing suite [2]
and Click modular router [22]. Figure 5 shows the struc-
ture of the implementation. The shaded boxes are the
modules we modify. We modify Quagga’s ospfd rout-
ing daemon to compute alternative paths as well as short-
est paths after each routing update, and associate a path
cost with each forwarding entry it sends to Click. We
then modify Click to support the forwarding algorithm
described in § 4. We also modify the interface between
Quagga and Click to support the exchange of the path
cost information. Our modification includes 3000 lines
of C/C++ code, which is about 6% of the ospfd code.
Figure 6 shows how we encode a 32-bit cost label in an
IP header. We overload the 16-bit IPID field, the 13-bit
fragment field, the 1-bit reserved flag field, and 2 high-
est bits in the TTL field to encode a 22-bit path cost and
10-bit noise. The anomaly counter is implemented using
different DiffServ codepoints. A 22-bit path cost is suf-
ficient because OSPF uses a 16-bit link cost metric, and
a network’s diameter almost never exceeds 64 hops.
We test our implementation using the Emulab testbed [40].
We construct an Abilene topology, and run both the vanilla
IP with OSPF and CoRRS on it. The ospfd’s timers are
configured as shown in Table 3. Those values are set ac-
cording to router implementations, as we will describe
in our later simulation study (§ ??). We run the exam-
ple shown in Figure 2. We first start a 1Mbps UDP flow
from the Sunnyvale node to the Kansas City node, and
then manually fail the link from Denver to Kansas City.
After the failure, the Sunnyvale node updates its routing
table one second later than the Denver node. We measure
the load of the link from Sunnyvale to Denver, which is
part of the loop, and the packet loss rate at the destination
Figure 7: This figure shows the traffic load on the Sunnyvale→
Denver link and the packet loss rate after the Denver→Kansas City
link fails.
node Kansas City.
Figure 7 shows the link load and packet loss rate dur-
ing the convergence period. The link failure happens
at time 1s. From the figure we can see that a forward-
ing loop occurs under vanilla OSPF and IP routing, and
it temporarily increases the link load by more than 30
folds, as the linux machines on Emulab use 64 as their
default TTLs. In contrast, CoRRS prevents loops and the
link load is not increased. The UDP flow under vanilla
OSPF and IP routing is interrupted by almost 1.5 sec-
onds, while CoRRS recovers the packet transmission right
after the failure is detected, which takes about 250ms.
We evaluate the computational and memory overhead
of CoRRS and compare them with NotVia [9], a state-
of-art IP fast rerouting technique that uses pre-computed
backup paths to bypass failures. The results are summa-
rized in Table 2. Our test runs on a Pentium D 2.4GHz
machine with 2GB memory.
The right side of Table 2 shows the time it takes to fin-
ish computing the alterative paths for CoRRS and NotVia
for various topologies. As can been seen, CoRRS’s com-
putation takes less than 100ms on the largest Sprint topol-
ogy, and the time is comparable with NotVia.
We evaluate the memory overhead by examining how
many entries a router needs to keep in its alterative path
database. The left side of Table 2 shows the results. The
results show that a repair path database may be 2-8 times
larger than a router’s normal forwarding table, and is
comparable with that of IPFRR. If the memory overhead
becomes a practical concern, it can be optimized [26].
We omit the details due to the space limitation.
We also benchmark the per-packet processing over-
head of CoRRS. This test is run on an AMD Dual Core
2.6GHz machine with 2GB memory. The vanilla IP for-
warding takes 0.8 µs, and CoRRS’s forwarding takes on
9
Topology # of Forwarding # of NotVia Entries # of RPD Entries RPD Computation NotVia Computation
Table Entries Time (ms) Time (ms)
Abilene Avg 15.4 Avg 17.3 Avg 0.165 Avg 0.093
Node:11 11 Max 17 Max 26 Max 0.176 Max 0.112
Link:28 Min 14 Min 12 Min 0.157 Min 0.073
Sprint Avg 368.8 Avg 777 Avg 79.4 Avg 49.7
Node:315 315 Max 449 Max 1769 Max 89.2 Max 84.2
Link:1944 Min 278 Min 534 Min 71.9 Min 36.8
Random Avg 116.6 Avg 276.1 Avg 6.2 Avg 2.7
Node:100 100 Max 140 Max 376 Max 11.9 Max 10.6
Link:394 Min 102 Min 149 Min 5.8 Min 2.0
Table 2: Summary of the memory and computational overhead introduced by CoRRS. For memory overhead the normal forwarding
table size and the number of NotVia entries are shown for comparison. For computational overhead the NotVia computation time is shown
for comparison.
average 1.7 µs. The processing overhead mainly comes
from the extra table lookup in the alterative path database
for packets in the 1-anomaly mode. We also compare
the throughput of vanilla IP forwarding and CoRRS for-
warding. On our test machine, the peak throughput for IP
forwarding is 496.8kpps, and for CoRRS forwarding is
471.6kpps. The performance decreases slightly by 5%.
7. SIMULATION
In this section, we use SSFNet [4], an event-driven
simulator that has a complete OSPFv2 implementation,
to simulate CoRRS and compare it with other designs,
including packets that carry failures (FAIL), and fast rerout-
ing techniques (FRR). CoRRS is implemented as described
in the previous sections. To enable a fair comparison,
FAIL is implemented as a much reduced version of FCP [25].
Each packet carries the last failure it encounters. Routers
use the existing routing protocol OSPF to converge. This
simplification makes FAIL have comparable header, com-
putational, and signaling overhead with CoRRS.
FRR are techniques that enable routers adjacent to fail-
ures to rapidly reroute packets around the failures. We
simulate NotVia [9], the most effective solutions in IP
networks. Fast rerouting techniques in MPLS network,
known as MPLS-FRR [33], are similar to NotVia except
that a packet’s next hop is identified using a label, rather
than the destination address. Thus we do not simulate
MPLS-FRR separately as it has the same characteristics
as NotVia.
FRR can be used in conjunction with a loop-prevention
convergence technique [8] to reduce micro-loops. Known
techniques [14, 16, 17] all slow down routing conver-
gence, and can not prevent loops when there are concur-
rent topology changes. For comparison purpose, we also
simulate the combination of FRR and a loop-prevention
convergence technique oFIB [14] (FRR+oFIB). We choose
oFIB because it converges faster than alternatives [16,
17].
7.1 Miscellaneous Settings
Network Topologies: We simulate on real, inferred,
and randomly generated topologies which are summa-
Parameter Value
HelloInterval 50ms
RouterDeadInterval 250ms
SPF Delay 200ms
SPF Computation Time (0.00247n2+0.978)ms
FIB/RIB Update Time (rand([100, 300])+ rand([0.1, 0.11])p)ms
Table 3: Summary of the simulation settings. n is the number of
routers in the network. p is the number of entries in the forwarding
table.
Topology Type # of Nodes # of Links
Abilene Real 11 28
Telstra Infer 108 306
Exodus Infer 79 294
Sprint Infer 315 1944
Random Random 100 394
Table 4: Summary of the topologies used in our simulation.
rized in Table 4. The inferred topologies are from the
Rocketfuel project [38], and the random topologies are
generated using the BRITE topology generator [1]. Real
and inferred topologies contain precise or inferred link
weights [28]. We use the random topology to test how
our design works on asymmetric networks. The link
weight in each direction is set independently, each using
a random number between 1 and 50.
Link delays of each topology are set according to the
geographic approximity of their end nodes. If two routers
are in different Point-of-Presences (PoP), we infer the
link delay between them from the geographical distance,
and in the generated topology the nodes are randomly
spread on a plane that has a similar size as the US conti-
nent. If two routers are in the same PoP, we assume the
link delay is 0.1ms.
Topology updates: We simulate routing transitions
for both single topology update and multiple concurrent
topology update events. For single topology update, we
test single link up/down events and node up/down events.
For multiple topology updates, we test two concurrent
link failures. For each type of update we run 100 exper-
iments with randomly chosen topology updates.
OSPF parameters: The various timers and delays of
the OSPF implementation are summarized in Table 3.
10
Update Type # of Tests Total # of Total # of Loop Duration (ms)
Containing Loops Micro-loops Links Involved Avg Max Min Stddev
OSPF
Link Failure 13 37 64 57.9 131.6 1.25 37.5
Node Failure 20 123 182 62.2 474.7 0.67 54.6
Link Up 9 23 46 45.1 162.3 0.69 33.2
Node Up 28 129 188 52.9 162.8 0.96 33.7
Two Link Failures 20 61 96 47.7 158.3 1.47 35.4
oFIB
Two Link Failures 24 60 96 45.9 172.5 0.22 38.9
Table 5: Summary of loops during convergence in the Sprint (AS1239) topology. For each update type we run 100 experiments with
randomly chosen topology updates.
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
Cum
ulat
ive
Fra
ctio
n
Flow Amplifying Factor
CoRRS+OSPF: 1 LinkCoRRS+OSPF: 2 Links
FRR+OSPF: 1 LinkFAIL+OSPF: 1 Link
IP+OSPF: 1 LinkFRR+oFIB: 2 Links
(a) Abilene
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
Cum
ulat
ive
Fra
ctio
n
Flow Amplifying Factor
CCP+OSPF: 1 LinkCCP+OSPF: 2 Links
IP+OSPF: 1 LinkIP+OSPF: 2 Links
NotVia+oFIB: 2 Links
(b) Sprint
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
Cum
ulat
ive
Fra
ctio
n
Flow Amplifying Factor
CoRRS+OSPF: 1 LinkCoRRS+OSPF: 2 Links
FRR+OSPF: 1 LinkFAIL+OSPF: 1 Link
IP+OSPF: 1 LinkFRR+oFIB: 2 Links
(c) Random
Figure 8: The cumulative distribution of the loops’ flow amplifying factor on three topologies for one link and two link failure cases.
These parameters are set according to the values rec-
ommended by various fast convergence techniques [6,
21] and the values observed in commercial production
routers [5, 15, 35]. We simulate fast convergence be-
cause our design’s benefits are even more prominent dur-
ing slow convergence in which micro-loops last longer,
but we desire to show only the benefits that cannot be
achieved by other techniques. With these settings, all
our simulated networks can converge within a second,
consistent with previous studies [6, 15].
7.2 Reducing Forwarding Disruptions
The first set of experiments evaluate how effectively
CoRRS can reduce forwarding disruptions during rout-
ing transitions. We use two metrics to measure forward-
ing disruptions: packet loss rates, and flow amplifica-
tion factors caused by micro-loops. The latter measures
the number of times a packet traverses the same uni-
directional link because of a micro-loop. We refer to
this factor as the flow amplifying factor because if a traf-
fic stream of t Mb/s is trapped in a loop and has a flow
amplifying factor K , then the stream’s peak rate would
become t×K Mb/s on the link. An amplified flow may
cause excessive congestion loss.
To measure the flow amplifying factors and packet
loss rates, we send probing packets every 5ms between
each pair of nodes. The probing packets’ TTLs are ini-
tiated to 128, the default TTL value of the Windows
XP operating system. We then measure the flow am-
plifying factors and packet loss rates from the probing
packet traces. Note that we do not simulate real traffic
patterns because it is extremely time-consuming to sim-
ulate. Otherwise our simulations would not finish in a
reasonable time, i.e., a few days.
Figure 8 compares the distributions of the flow ampli-
fying factors of vanilla IP forwarding with OSPF, CoRRS
with OSPF, FRR with OSPF, and the combination of
FRR and oFIB in a real network topology (Abilene),
an inferred tier-1 network topology (Sprint), and a ran-
domly generated network topology (Random). The dis-
tributions are drawn from all micro-loops we have ob-
served in the tests. As shown in the figure, CoRRS’s
flow amplifying factor is ≤ 2. This result shows that
CoRRS prevents packets from being trapped in micro-
loops. In contrast, the vanilla IP forwarding with OSPF,
FRR+OSPF, and FAIL+OSPF all have amplifying fac-
tors up to 64, indicating that packets are finally discarded
due to TTL expirations. Table 5 summarizes the charac-
teristics of micro-loops observed in our simulations. As
can be seen, some loops may last close to half second,
suffiently long to cause voluminous flow amplication.
The combination of FRR and oFIB also prevent loops
in single topology update events, but during two link fail-
ures, they perform similar to vanilla IP forwarding under
OSPF, as oFIB is unable to prevent loops in this situation
and falls back to the default OSPF. Table 5 also shows
the micro-loops formed during oFIB convergence when
there are two link failures.
Figure 9 and Figure 10 show the average packet loss
rates after a link failure and two link failures for each
mechanism in three topologies. Note that the packet loss
rates do not include congestion loss caused by micro-
loops, because we did not simulate real traffic load. There-
fore, the packet loss rates we measure are those caused
11
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIBIP+OSPF
(a) Abilene
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CCP+OSPFNotVia+oFIB
IP+OSPF
(b) Sprint
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIBIP+OSPF
(c) Random
Figure 9: The average packet loss rate after a single link failure. X-axis is the time-line. The failure happens at time 0, and is detected
after 200 ∼ 250ms. Y-axis is the packet loss rate for all probing flows that use the failed link.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIBIP+OSPF
(a) Abilene
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CCP+OSPFNotVia+oFIB
IP+OSPF
(b) Sprint
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pac
ket L
oss
Rat
e
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIBIP+OSPF
(c) Random
Figure 10: The average packet loss rate after two link failures. Other configurations are the same as in Figure 9.
by failed routes, and should be lower in designs that have
micro-loops during routing convergence. The loss rate at
time t is computed by counting how many probe pack-
ets sent during the period [t, t + 10ms] are eventually
received. We average the loss rates over all experiments
for each type of update event.
As can be seen, CoRRS and other fast restoration tech-
niques can all successfully reduce packet loss rates, while
vanilla IP forwarding combined with OSPF have the high-
est loss rate, as packets are discarded until the network
has converged. This is consistent with our emulation
study in the previous section. The packet disruption times
may last about 1 second. In a single failure case, after
the failure is detected (after∼ 200ms), CoRRS and other
fast restoration techniques are able to rapidly bypass fail-
ures using pre-computed paths. In some of the figures,
the packet loss rates do not reach zero because the net-
work is disconnected. In the two link failure case, none
of the techniques have guaranteed recovery. As a result,
some packets are discarded after failures are detected,
especially for the Abilene network. During the conver-
gence period, CoRRS has a slightly higher loss rate than
FFR+oFIB and the vanilla IP forwarding in the Abilene
network. This is because it discards packets potentially
trapped in loops rather than forward them. As shown
in Figure 8(a), the flow amplifying factor in the Abilene
network is less than 64, indicating that the looped pack-
ets are not discarded due to TTL expirations and can still
reach desination after looping for several times. In prac-
tice, packets that loop for dozens of times might have al-
ready congested a link and been discarded due to conges-
tion, but our simulations do not show this effect. Results
for other update events are similar and are not shown for
brevity.
7.3 Convergence Time
We also measure the convergence time for both CoRRS
and FRR+oFIB. oFIB reduces micro-loops during rout-
ing convergence, but it comes at the cost of delaying
routing convergence. A key advantage of our design is
that it does not delay normal routing convergence. Fast
convergence makes the network more resilient and re-
sponsive to changes. For instance, if the network’s con-
vergence time is less than the inter-arrival time of con-
secutive failures, each failure becomes essentially a sin-
gle failure and the network can use pre-computed paths
to rapidly bypass the failure, reducing the packet disrup-
tion times. In addition, fast convergence also reduces the
time a path follows a suboptimal path, e.g., reaching a
failure first before it is rerouted.
Figure 11 shows the network convergence time for
various single topology updates for CoRRS and FRR+oFIB.
FAIL and FRR that use OSPF to converge have the same
convergence time as CoRRS. The convergence time is
measured from the time the first router announces a rout-
ing update to the time that all routers have finished up-
dating their forwarding tables. In multiple topology up-
dates, oFIB falls back to OSPF. Therefore, it has the
same convergence time as OSPF. However, in all other
cases, oFIB converges 2 ∼ 5 times slower than CoRRS,
12
0
0.5
1
1.5
2
2.5
3
link down link up node down node up 2 links down
Con
verg
ence
Tim
e (s
)
IP+OSPF/CoRRS+OSPFFRR+oFIB
(a) Abilene
0
1
2
3
4
5
6
7
link down link up node down node up 2 links down
Con
verg
ence
Tim
e (s
)
IP+OSPF/CoRRS+OSPFFRR+oFIB
(b) Sprint
0
0.5
1
1.5
2
2.5
3
link down link up node down node up 2 links down
Con
verg
ence
Tim
e (s
)
IP+OSPF/CoRRS+OSPFFRR+oFIB
(c) Random
Figure 11: The averaged convergence time after different network changes. CCP has the same convergence time as OSPF. The error
bars show the standard deviations.
0.9 1
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
0 0.5 1 1.5 2
Pat
h S
tret
ch
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIB
(a) Abilene
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
0 0.5 1 1.5 2 2.5 3
Pat
h S
tret
ch
Time (s)
CCP+OSPFNotVia+oFIB
(b) Sprint
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
0 0.5 1 1.5 2 2.5 3
Pat
h S
tret
ch
Time (s)
CoRRS+OSPFFRR+OSPFFAIL+OSPF
FRR+oFIB
(c) Random
Figure 12: The averaged path stretch after a link failure. X-axis is the time-line. The failure happens at time 0, and is detected after
200-250ms. Y-axis is the packet stretch for all probing packets which previously pass the failure.
because it imposes a strict update order on routers in the
network [14]. This result is consistent with that obtained
in [14]. The convergence time does not include CoRRS’s
alternate path computation time and FRR’s backup path
computation time, which we evaluate separately using
the real implementation as described in § 6.
7.4 Path Stretch
Path stretch is defined as the ratio of the cost of a path
taken by a packet to the shortest path cost in the network.
Figure 12 shows the average path stretch during rout-
ing convergence for a single link failure case. The path
stretch is averaged over all source and destination pairs
whose default forwarding paths incude the failed link.
As can be seen, with CoRRS the path stretch reduces to
1 in less than one second, while FRR+oFIB may take 2
∼ 3 times longer to reduce the average path stretch to
1. Both FAIL and FRR without oFIB have much higher
path stretch during convergence. This is because OSPF
convergence is not loop-free; some packets reach their
destinations by taking extremely long paths after loop-
ing for many times.
7.5 Summary
In summary, our simulation results suggest that com-
pared to other designs, CoRRS prevents micro-loops with-
out slowing or constraining routing convergence. It can
find high-quality loop-free paths with a high probability,
prevents packet loss after the failure is detected, and has
a shorter path stretch during routing convergence.
8. RELATED WORK
CoRRS is motivated by recent work on improving the
availability of Internet routing, including consensus rout-
ing [20] and convergence-free routing [25]. The design
of CoRRS explores a complimentary approach. It aims
not to slow down or constrain routing convergence, while
consensus routing or convergence-free routing sacrifices
the responsivenes of the routing system for consistency.
In this spirit, CoRRS shares similarity with Anormaly-
Cognizant Forwarding (ACF) [11], but this work focuses
on intra-domain routing, and uses the remaining path
cost as a safeguard to detect anomaly. ACF detects for-
warding anomaly caused by inter-domain routing (BGP)
covnergence. A packet carries the entire AS path its has
visited to detect forwarding anomaly.
Deflections [41] and Path slicing [30] provide path
diversity for end systems to improve performance and
bypass failures, while CoRRS aims to enable routers to
rapidly detect forwarding anomalies and repair them dur-
ing routing transitions.
There is also much work in enabling routers to rapidly
forward packets using backup paths after failure detec-
tion, including rBGP, IPFRR, and MPLSFRR [23, 33,
37]. These proposals provide fast failure recovery, but
they alone do not prevent micro-loops. Known loop-
prevention convergence techniques [14, 16, 17] all slow
down routing convergence. CoRRS does not slow down
or constrain routing convergence, and provide valid re-
covery paths with a high probability.
13
9. CONCLUSION AND FUTURE WORK
Routing convergence is a main cause for packet loss
and delay on the Internet. Recent work has made much
progress in reducing the adverse effects of routing con-
vergence, but often at the cost of slowed or constrained
routing convergence. This work aims to explore routing
designs that reduce forwarding disruption without sac-
rificing the routing system’s responsiveness to dynamic
changes. Our high-level approach is to enable routers
to detect forwarding anomalies and repair them during
routing convergence. As a first step, we present the de-
sign and implementation of CoRRS, an intra-domain rout-
ing system that ensures consistent forwarding with a high
probability but does not slow down routing convergence.
In the CoRRS design, a packet carries the remaining path
cost in its header, and routers use this information as a
safeguard to detect forwarding anomalies and to discover
valid paths. We also show that carrying limited infor-
mation is necessary for routers to reliably detect all for-
warding anomalies; cost is more effective, flexible, and
lightweight that other information such as a path identi-
fier or failures.
We evaluate CoRRS using both linux-based imple-
mentation and large-scale simulations. Our evaluation
shows that CoRRS is amenable to high-speed implemen-
tation. It achieves 95% of the vanilla IP forwarding through-
put in our prototype implementation. Our simulation re-
sults show that CoRRS is able to prevent micro-loops
and find valid recovery paths during routing transistions
without slowing down routing convergence.
This result suggests that it is promising to achieve con-
sistent and responsive routing with a limited amount of
“safeguard” information in a packet header. It is our
future work to explore how to extend the CoRRS ap-
proach to inter-domain routing that uses policy rather
than cost to select paths. A key challenge there is to
design an efficient and effective safeguard to detect for-
warding anomalies and facilitate recovery.
References[1] BRITE Topology Generator. http://www.cs.bu.edu/brite.
[2] Quagga Routing Suite. http://www.quagga.net.
[3] Reducing Link Failure Detection Time with BFD. http://www.
networkworld.com/community/node/23380.
[4] Scalable Simulation Framework. http://www.ssfnet.org.
[5] SPF Delay Timer. http://www.juniper.net/techpubs/
software/junos/junos74/swconfig74-routing/html/
isis-summary53.html#1036104.
[6] C. Alaettinoglu, V. Jacobson, and H. Yu. Towards Milli-Second IGP Con-
vergence. Internet draft, draft-alaettinoglu-isis-convergence-00.txt, Nov
2000.
[7] C. Boutremans, G. Iannaccone, and C. Diot. Impact of link failures on
VoIP performance. In NOSSDAV, 2002.
[8] S. Bryant and M. Shand. A Framework for Loop-free Convergence. Inter-
net draft, draft-bryant-shand-lf-conv-frmwk-03.txt, Oct 2006.
[9] S. Bryant, M. Shand, and S. Previdi. IP Fast Reroute Using Notvia Ad-
dresses. Internet draft, draft-ietf-rtgwg-ipfrr-notvia-addresses-00.txt, Dec
2006.
[10] R. Callon. Use of OSI IS-IS for Routing in TCP/IP and Dual Environments.
RFC1195, Dec 1990.
[11] A. Ermolinskiy and S. Shenker. Reducing Transient Disconnectivity using
Anomaly-Cognizant Forwarding. In ACM SIGCOMM HotNets VII, 2008.
[12] N. Feamster and H. Balakrishnan. Packet Loss Recovery for Streaming
Video. In International Packet Video Workshop, 2002.
[13] B. Fortz and M. Thorup. Optimizing ospf/is-is weights in a changing world.
IEEE Journal on Selected Areas in Communications, 20(4):756–767, May
2002.
[14] P. Francois and O. Bonaventure. Avoiding transient loops during the con-
vergence of link-state routing protocols. IEEE/ACM Transactions on Net-
working, 15(6):1280–1932, Dec 2007.
[15] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure. Achieving sub-
second IGP convergence in large IP networks. SIGCOMM Comput. Com-
mun. Rev., 35(3):35–44, 2005.
[16] P. Franois, M. Shand, and O. Bonaventure. Disruption-free topology re-
configuration in OSPF Networks. In IEEE INFOCOM, Anchorage, USA,
May 2007. INFOCOM 2007 Best Paper Award.
[17] J. J. Garcia-Luna-Aceves. Loop-free routing using diffusing computations.
IEEE/ACM Trans. Netw., 1(1):130–141, 1993.
[18] U. Hengartner, S. Moon, R. Mortier, and C. Diot. Detection and Analysis
of Routing Loops in Packet Traces. In SIGCOMM IMW, 2002.
[19] G. Iannaccone, C. Chuah, S. Bhattacharyya, and C. Diot. Feasibility of IP
restoration in a tier-1 backbone. IEEE Network Magazine, Jan-Feb 2004.
[20] J. P. John, E. Katz-Bassett, A. Krishnamurthy, T. Anderson, and
A. Venkataramani. Consensus routing: the internet as a distributed system.
In NSDI’08: Proceedings of the 5th USENIX Symposium on Networked
Systems Design and Implementation, pages 351–364, 2008.
[21] D. Katz and D. Ward. Bidirectional Forwarding Detection. Internet draft,
draft-ietf-bfd-base-07.txt, Jan 2008.
[22] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click
modular router. ACM Transactions on Computer Systems, 18:263–297,
2000.
[23] N. Kushman, S. Kandula, D. Katabi, and B. M. Maggs. R-BGP: Staying
connected in a connected world. In NSDI, 2007.
[24] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed internet routing
convergence. In in Proc. ACM SIGCOMM, pages 175–187, 2000.
[25] K. Lakshminarayanan, M. Caesar, M. Rangan, T. Anderson, S. Shenker,
and I. Stoica. Achieving convergence-free routing using failure-carrying
packets. In SIGCOMM, pages 241–252, 2007.
[26] A. Li, X. Yang, and D. Wetherall. Consistent and Responsive Routing with
Safeguard. Technical Report DUKE-CS-TR-2008-04, Duke, 2008.
[27] A. Li, X. Yang, and D. Wetherall. Towards Disruption-Free Intra-Domain
Routing. In SIGCOMM 2008 Student Poster, 2008.
[28] R. Mahajan, N. T. Spring, D. Wetherall, and T. E. Anderson. Inferring link
weights using end-to-end measurements. In Internet Measurement Work-
shop, pages 231–236, 2002.
[29] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and
C. Diot. Characterization of Failures in an IP Backbone Network. In IN-
FOCOM, 2004.
[30] M. Motiwala, N. Feamster, and S. Vempala. Path Splicing: Reliable Con-
nectivity with Rapid Recovery. In ACM SIGCOMM HotNets VI, 2007.
[31] J. Moy. OSPF Version 2. RFC2328, Apr 1998.
[32] S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, and C.-N. Chuah. Fast local
rerouting for handling transient link failures. IEEE/ACM Trans. Netw.,
15(2):359–372, 2007.
[33] P. Pan, G. Swallow, and A. Atlas. Fast Reroute Extensions to RSVP-TE for
LSP Tunnels. RFC4090, May 2005.
[34] Y. Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4). RFC1771,
Mar 1995.
[35] A. Shaikh and A. G. Greenberg. Experience in black-box ospf measure-
ment. In Internet Measurement Workshop, pages 113–125, 2001.
[36] M. Shand and S. Bryant. IP Fast Reroute Framework. Internet draft, draft-
ietf-rtgwg-ipfrr-framework-07.txt, Jun 2007.
[37] M. Shand and S. Bryant. IP Fast Reroute Framework. Internet draft, draft-
ietf-rtgwg-ipfrr-framework-08.txt, Feb. 2008.
[38] N. T. Spring, R. Mahajan, D. Wetherall, and T. E. Anderson. Measuring
ISP topologies with rocketfuel. IEEE/ACM Trans. Netw., 12(1):2–16, 2004.
[39] J.-P. Vasseur, M. Pickavet, and P. Demeester. Network Recovery: Protec-
tion and Restoration of Optical, SONET-SDH, and MPLS. Morgan Kauf-
mann, 2004.
[40] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold,
M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environ-
ment for distributed systems and networks. In OSDI02, pages 255–270,
Boston, MA, Dec 2002.
[41] X. Yang and D. Wetherall. Source selectable path diversity via routing
deflections. In SIGCOMM, pages 159–170, 2006.
14