reducing transient disconnectivity using anomaly-cognizant forwarding andrey ermolinskiy, scott...
TRANSCRIPT
Reducing Transient Disconnectivity using Anomaly-Cognizant ForwardingAndrey Ermolinskiy, Scott Shenker
University of California – Berkeley and ICSI
What’s the problem? One of the central goals of the Internet - continuous
end-to-end connectivity
BGP convergence is a major cause of connectivity disruption Routers operate upon potentially inconsistent local views Temporary inconsistencies give rise to anomalies such as
loops and black holes that disrupt end-to-end packet delivery
Example: transient routing loop with BGP
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
withdraw BA
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
withdraw BA
Routing loop between C and D incurs temporary loss of connectivity between {B, C, D, E, F} and A.
Example: transient routing loop with BGP
Related Work Shrinking the convergence time window through BGP protocol extensions
Ghost flushing Consistency assertions
Protecting end-to-end packet delivery from adverse effects of convergence R-BGP
Forward packets on pre-computed failover paths, Propagate root cause information to prevent loops
Consensus Routing Enforce a globally-consistent view via distributed snapshots and strategically delay adoption of incoming BGP updates
Anomaly-Cognizant Forwarding
Anomaly-Cognizant Forwarding (ACF) Approach Accept routing anomalies as an unavoidable fact Protect end-to-end packet delivery by detecting and recovering from anomalies on the forwarding path
Main hypothesis Several simple and lightweight extensions to conventional IP forwarding enable us to sustain packet delivery during periods of BGP instability
without the use of pre-computed backup paths without modifying the core routing protocol or altering its timing dynamics
Domain S has anomalous forwarding state for destination D if S’s outgoing packets destined for D arrive back to S as result of a routing loop.
Main idea of ACF: Detect occurrences of anomalous state
Avoid forwarding packets via domains that are known to have anomalous state.
S
DAnomalous forwarding state
ACF Overview
Each packet carries a list of prior AS-level hops (pathTrace)
Each packet carries a blackList of domains with anomalous state
pathTrace blackList
Packet header
ACF OverviewForward (packet p) {
if (localASNum in p.pathTrace)
Move loop elements from p.pathTrace to p.blackList
nextHop lookupNextHop(p.destAddr)
if (nextHop in p.blackList)
Invoke the control plane, look for alternate non-blacklisted routes in the RIB
if (nextHop != NONE) {
Append localASNum to p.pathTrace
SendPacket(p, nextHop)
} else
Initiate recovery-mode forwarding for p
}
ACF Recovery-mode forwarding
Normal-mode forwarding
Recovery-mode forwarding
Intuition: R or some router along the path to R may know a working alternate route to the original destination.
If a router is unable to forward a packet because it does not have a valid non-blacklisted route, it initiates recovery forwarding. Chooses a recovery destination R from a static and well-
known set of highly-connected Tier-1 domains. Detours the packet through R.
R1 R2
nextHop=NONE
Recovery destinations
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C ] blackList = { }dst = A origDst =
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C D ] blackList = { }dst = A origDst =
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
pathTrace = [ C D ] blackList = {D }
p.Headerdst = A origDst =
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ ] blackList = {C D }dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ ] blackList = {C D }dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C] blackList = {C D }dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C] blackList = {C D }dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C E] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ C E] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ ] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
F resumes normal-mode
forwarding
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ F] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
F resumes normal-mode
forwarding
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ F G] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
F resumes normal-mode
forwarding
Anomaly-Cognizant Forwarding
A
B
C D
EF
G
1. BA 2. CBA
1. BA 2. DBA
1. CBA 2. DBA
1. ECBA 2. GA
p
p.Header
pathTrace = [ F G] blackList = {C D E}dst = F origDst = A
C initiates recovery forwarding through domain F
F resumes normal-mode
forwarding
ACF: Observations ACF does not use pre-computed failover paths
Discovers alternate routes dynamically using state in the packet header The two forwarding modes make use of the same forwarding table
Paths to recovery destinations are not assumed to be stable and anomaly-free We protect recovery-mode forwarding using the same mechanism (pathTrace and blackList)
ACF: Preliminary Evaluation Evaluation metrics
Effectiveness in eliminating transient disconnectivity Efficiency of alternate paths Packet header overhead
ACF: Preliminary Evaluation Simulation methodology
CAIDA AS-level topology (27969 nodes) annotated with inferred inter-AS relationships 12937 multihomed edge domains, 29426 adjacent provider links Provider link failure experiment
For each multihomed domain D, and each provider link L Fail L and simulate packet delivery from every other domain to D during
convergence
D
S1
S2
S4
S3
Recovery destinations = 10 highly-connected Tier-1 ISPs Packet TTL = 32 hops
ACF: Preliminary Evaluation Transient disconnection after a link failure
BGP with conventional forwarding 51% of failures cases produce unwarranted disconnection Widespread disconnection (>50% of ASes) in 17% of cases
BGP with ACF No disconnection in 92% of failure
cases <1% of ASes see disconnection in
98% of failure cases
ACF: Preliminary Evaluation Transient path efficiency
Causes of path dilation in ACF Transient loops Detouring via a recovery
destination
F – failure cases that produce transient disconnection with conventional forwarding
In 65% of failure cases that produce disconnectivity, ACF recovers packets using ≤ 2 extra hops
9% of cases require 7 hops or more
ACF: Preliminary Evaluation Packet header overhead
% of ASes disconnected 0% 0.09% 0.9% 9% 90%
pathTrace length 11 16 16 20 13
blackList length 4 11 9 11 16
Maximum number of pathTrace and blackList entries in a representative sample of failure cases.
Worst-case pathTrace – 20 entries 40 bytes of overhead assuming 16-bit AS numbers
Worst-case blackList – 16 entries 10 bytes of overhead for a Bloom filter with 1% error rate
Challenges / Concerns Feasibility of deployment
ACF adds fields to packet header and modifies core IP forwarding logic.
Packet processing overhead Control plane is invoked only during periods of
instability Common case: check pathTrace and blackList.
Both operations admit efficient implementation in hardware and parallelization.
ACF and routing policies