![Page 1: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/1.jpg)
Detecting Peering Infrastructure Outages in the Wild
Vasileios Giotsas †∗, Christoph Dietzel † §, Georgios Smaragdakis ‡ †, Anja Feldmann †, Arthur Berger ¶ ‡, Emile Aben #
†TU Berlin ∗CAIDA §DE-CIX ‡MIT ¶Akamai #RIPE NCC
![Page 2: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/2.jpg)
Peering Infrastructures are critical part of the interconnection ecosystem
Internet Exchange Points (IXPs) provide a shared switching fabric for
layer-2 bilateral and multilateral peering.○ Largest IXPs support > 100 K of peerings, > 5 Tbps peak traffic
○ Typical SLA 99.99% (~52 min. downtime/year)1
Carrier-neutral co-location facilities (CFs) provide infrastructure for
physical co-location and cross-connect interconnections.○ Largest facilities support > 170 K of interconnections
○ Typical SLA 99.999% (~5 min. downtime/year)2
1 https://ams-ix.net/services-pricing/service-level-agreement 2http://www.telehouse.net/london-colocation/
2
![Page 3: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/3.jpg)
Outages in peering infrastructures can severely disrupt critical services and applications
3
![Page 4: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/4.jpg)
Outages in peering infrastructures can severely disrupt critical services and applications
4
Outage detection crucial to improve situational awareness,
risk assessment and transparency.
![Page 5: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/5.jpg)
Current practice: “Is anyone else having issues?”
5
● ASes try to crowd-source the detection and localization of outages.
● Inadequate transparency/responsiveness from infrastructure operators.
![Page 6: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/6.jpg)
Symbiotic and interdependent infrastructures6
https://www.franceix.net/en/technical/infrastructure/
![Page 7: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/7.jpg)
Remote peering extends the reach of IXPs and CFs beyond their local market
Global footprint of AMS-IXhttps://ams-ix.net/connect-to-ams-ix/peering-around-the-globe
7
![Page 8: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/8.jpg)
Our Research Goals
1. Outage detection:
○ Automated, Timely, Building-level
2. Outage localization:
○ Distinguish cascading effects from outage source
3. Outage tracking:
○ Determine duration, shifts in routing paths, geographic spread
8
![Page 9: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/9.jpg)
Challenges in detecting infrastructure outages
9
Actual incident
![Page 10: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/10.jpg)
Challenges in detecting infrastructure outages
10
Beforeoutage
VP
Actual incident Observed paths
![Page 11: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/11.jpg)
Challenges in detecting infrastructure outages
11
Beforeoutage
VP
Actual incident Observed paths
![Page 12: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/12.jpg)
Challenges in detecting infrastructure outages
12
Beforeoutage
Duringoutage
VP
Actual incident Observed paths
![Page 13: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/13.jpg)
Challenges in detecting infrastructure outages
13
AS path does not change!
Beforeoutage
Duringoutage
1. Capturing the infrastructure-level hops between ASes
VP
Actual incident Observed paths
![Page 14: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/14.jpg)
Challenges in detecting infrastructure outages
14
Beforeoutage
Duringoutage
IXP or Facility 2 failed
1. Capturing the infrastructure-level hops between ASes
VP
Actual incident Observed paths
![Page 15: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/15.jpg)
Challenges in detecting infrastructure outages
15
IXP is still active
Beforeoutage
Duringoutage
IXP or Facility 2 failed
Duringoutage
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points
VP
VP
Actual incident Observed paths
![Page 16: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/16.jpg)
Challenges in detecting infrastructure outages
16
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
Beforeoutage
Duringoutage
Duringoutage
VP
VPNo hop changes
The initial hops
changed
Actual incident Observed paths
![Page 17: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/17.jpg)
Challenges in detecting infrastructure outages
17
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
France-IX topology
Djibouti Telecom
Telkom Indonesia
![Page 18: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/18.jpg)
Challenges in detecting infrastructure outages
18
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
BGP measurement
BGP
BGP
BGP
Djibouti Telecom
Telkom Indonesia
![Page 19: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/19.jpg)
Challenges in detecting infrastructure outages
19
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
BGP
BGP
BGP
Traceroute measurement
149.6.154.142 37.49.237.126Telkom
Indonesia
![Page 20: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/20.jpg)
Challenges in detecting infrastructure outages
20
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
BGP
BGP
BGP
Traceroute measurement
Traceroute
Traceroute
Traceroute
149.6.154.142 37.49.237.126
3 Giotsas, Vasileios, et al. "Mapping peering interconnections to a facility", CoNEXT 20154 Motamedi, Reza, et al. “On the Geography of X-Connects”, Technical Report CIS-TR-2014-02. University of Oregon, 20145 Nomikos, George, et al. "traIXroute: Detecting IXPs in traceroute paths.". PAM 2016
Telkom Indonesia
IP-to-Facility3,4 and IP-to-IXP5 mapping possible but expensive!
Djibouti Telecom
![Page 21: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/21.jpg)
Challenges in detecting infrastructure outages
21
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
BGP
BGP
BGP
Traceroute
Traceroute
Traceroute
Can we combine continuous passive measurements with fine-
grained topology discover?
![Page 22: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/22.jpg)
Challenges in detecting infrastructure outages
22
1. Capturing the infrastructure-level hops between ASes2. Correlating the paths from multiple vantage points3. Continuous monitoring of the routing system
BGP
BGP
BGP
Traceroute
Traceroute
Traceroute
![Page 23: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/23.jpg)
Deciphering location metadata in BGP
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
23
![Page 24: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/24.jpg)
Deciphering location metadata in BGP
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
24
BGP Communities:
● Optional attribute
● Encodes arbitrary
metadata
● Series of 32-bit
numerical values
![Page 25: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/25.jpg)
Deciphering location metadata in BGP
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
Top 16 bits:
ASN that sets
the community.
Bottom 16 bits:
Numerical value
that encodes the
actual meaning.
25
![Page 26: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/26.jpg)
Deciphering location metadata in BGP
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
The BGP Community 2:200
is used to tag routes
received at Facility 2
26
![Page 27: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/27.jpg)
Deciphering location metadata in BGP
PREFIX: 3.3.3.3/24ASPATH: 4 3
COMMUNITY: 4:8714 4:400
PREFIX: 2.2.2.2/24ASPATH: 4 2
COMMUNITY: 4:8714 4:400
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
27
![Page 28: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/28.jpg)
Deciphering location metadata in BGP
PREFIX: 3.3.3.3/24ASPATH: 4 3
COMMUNITY: 4:8714 4:400
PREFIX: 2.2.2.2/24ASPATH: 4 2
COMMUNITY: 4:8714 4:400
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:200
Multiple communities
can tag different types
of ingress points.
28
![Page 29: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/29.jpg)
Deciphering location metadata in BGP
PREFIX: 3.3.3.3/24ASPATH: 4 3
COMMUNITY: 4:400
PREFIX: 2.2.2.2/24ASPATH: 4 2
COMMUNITY: 4:8714 4:400
PREFIX: 1.0.0.0/24ASPATH: 2 1 0
COMMUNITY: 2:100
When a route changes ingress
point, the community values will
be update to reflect the change.
29
![Page 30: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/30.jpg)
Interpreting BGP Communities
● Community values not
standardized.
● Documentation in public data
sources:
○ WHOIS, NOCs websites
● 3,049 communities by 468 ASes
30
![Page 31: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/31.jpg)
Topological coverage
31
● ~50% of IPv4 and ~30% of IPv6
paths annotated with at least one
Community in our dictionary.
● 24% of the facilities in PeeringDB,
98% of the facilities with at least 20
members.
![Page 32: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/32.jpg)
Passive outage detection: Initialization32
For each vantage point (VP) collect all the stable BGP routes
tagged with the communities of the target facility (Facility 2)
Time
![Page 33: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/33.jpg)
Passive outage detection: Initialization33
For each vantage point (VP) collect all the stable BGP routes
tagged with the communities of the target facility (Facility 2)
AS_PATH: 1 x
COMM: 1:FAC2AS_PATH: 2 1 0
COMM: 2:FAC2
AS_PATH: 4 x
COMM: 4:FAC2
Time
![Page 34: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/34.jpg)
Passive outage detection: Monitoring34
Track the BGP updates of the stable paths for changes in the
communities values that indicate ingress point change.
Time
![Page 35: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/35.jpg)
Passive outage detection: Monitoring35
AS_PATH: 2 1 0
COMM: 2:FAC1
We don’t care about AS-level path
changes if the ingress-tagging
communities remain the same.
Time
![Page 36: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/36.jpg)
Passive outage detection: Outage signal36
AS_PATH: 2 1 0
COMM: 2:FAC1
AS_PATH: 1 x
COMM: 1:FAC1
AS_PATH: 4 x
COMM: 4:FAC4
4:IXP
● Concurrent changes of communities values for the same facility.
● Indication of outage but not final inference yet!
Time
![Page 37: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/37.jpg)
Passive outage detection: Outage signal37
AS_PATH: 2 1 0
COMM: 2:FAC1
AS_PATH: 1 x
COMM: 1:FAC1
AS_PATH: 4 x
COMM: 4:FAC4
4:IXP
● Concurrent changes of communities values for the same facility.
● Indication of outage but not final inference yet!
Partial outage
Time
![Page 38: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/38.jpg)
Passive outage detection: Outage signal38
AS_PATH: 2 1 0
COMM: 2:FAC1
AS_PATH: 1 x
COMM: 1:FAC1
AS_PATH: 4 x
COMM: 4:FAC4
4:IXP
● Concurrent changes of communities values for the same facility.
● Indication of outage but not final inference yet!
Partial outage?
De-peering of large ASes?
Major routing policy change?
Time
![Page 39: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/39.jpg)
Passive outage detection: Outage signal39
AS_PATH: 2 1 0
COMM: 2:FAC1
AS_PATH: 1 x
COMM: 1:FAC1
AS_PATH: 4 x
COMM: 4:FAC4
4:IXP
Signal investigation:
● Targeted active measurements.
● How disjoint are the affected paths?
● How many ASes and links have been affected?
Partial outage?
De-peering of large ASes?
Major routing policy change?
Time
![Page 40: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/40.jpg)
Passive outage detection: Outage tracking40
AS_PATH: 1 x
COMM: 1:FAC2AS_PATH: 2 1 0
COMM: 2:FAC2
End of outage inferred when the majority
of paths return to the original facility.
Time
![Page 41: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/41.jpg)
De-noising of BGP routing activity41
Time
Num
ber
of B
GP
messages (
log)
105
103
101
The aggregated activity of BGP
messages (updates, withdrawals,
states) provides no outage indication.
![Page 42: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/42.jpg)
De-noising of BGP routing activity42
The aggregated activity of BGP
messages (updates, withdrawals,
states) provides no outage indication.
The BGP activity filtered using
communities provides strong
outage signal.
Time
Num
ber
of B
GP
messages (
log)
105
103
101
Time
Nu
mb
er
of B
GP
me
ssa
ge
s (
log
)
105
103
101
1.0
0.4
0.2
0.6
0.8
Fra
ctio
n o
f in
fra
str
uctu
re p
ath
s
0
![Page 43: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/43.jpg)
43
● The location of community values that trigger outage signals
may not be the outage source!
● Communities encode the ingress point closest (near-end) to our
VPs:
○ ASes may be interconnected over multiple intermediate
infrastructures
○ Failures in intermediate infrastructures may affect the near-end
infrastructure paths
Outage localization is more complicated!
![Page 44: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/44.jpg)
Outage localization is more complicated!44
Time
![Page 45: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/45.jpg)
Outage localization is more complicated!45
Time
![Page 46: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/46.jpg)
Outage localization is more complicated!46
Outage in Facility 2 causes drop in the paths of Facility 4!
Time
![Page 47: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/47.jpg)
Outage localization is more complicated!47
Time
![Page 48: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/48.jpg)
Outage localization is more complicated!48
Outage in Facility 3 causes drop in the paths of Facility 4!
Time
![Page 49: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/49.jpg)
Outage source disambiguation and localization49
● Create high-resolution co-location maps:
○ AS to Facilities, AS to IXPs, IXPs to Facilities
○ Sources: PeeringDB, DataCenterMap, operator websites
● Decorrelate the behaviour of affected ASes based on their
infrastructure colocation.
![Page 50: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/50.jpg)
Outage localization is more complicated!50
Far-end ASes colocated in Facility 2
Time
![Page 51: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/51.jpg)
Outage localization is more complicated!51
Far-end ASes colocated in Facility 3
Time
![Page 52: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/52.jpg)
Outage source disambiguation and localization52
Paths not investigated in aggregated manner, but at the
granularity of separate (AS, Facility) co-locations.
London Telecity HE8/9 outage
London Telehouse North outage
Time
![Page 53: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/53.jpg)
Outage source disambiguation and localization53
London Telecity HE8/9 outage
London Telehouse North outage
London Telecity HE8/9 outage
London Telehouse North outage
Paths not investigated in aggregated manner, but at the
granularity of separate (AS, Facility) co-locations.
Time
![Page 54: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/54.jpg)
Detecting peering infrastructure outages in the wild
54
● 159 outages in 5 years of BGP data○ 76% of the outages not reported in popular mailing lists/websites
● Validation through status reports, direct feedback, social media○ 90% accuracy, 93% precision (for trackable PoPs)
![Page 55: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/55.jpg)
Effect of outages on Service Level Agreements
55
~70% of failed facilities below 99.999% uptime
~50% of failed IXPs below 99.99% uptime
5% of failed infrastructures below 99.9% uptime!
![Page 56: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/56.jpg)
Measuring the impact of outages56
> 56 % of the affected links in different country, > 20% in different continent!
Median RTT rises by > 100 ms for rerouted paths during AMS-IX outage.
Nu
mb
er
of a
ffe
cte
d li
nks (
log
)
105
103
101
CD
F
1.0
0.4
0.2
0.6
0.8
0
0.44
Distance from outage source (km)12K8K 10K6K4K0 2K
Fra
ctio
n o
f p
ath
s
RTT (ms)
![Page 57: Detecting Peering Infrastructure Outages in the Wild...Detecting peering infrastructure outages in the wild 54 159 outages in 5 years of BGP data 76% of the outages not reported in](https://reader034.vdocument.in/reader034/viewer/2022042320/5f09ecf87e708231d4292a62/html5/thumbnails/57.jpg)
Conclusions
● Timely and accurate infrastructure-level outage detection through
passive BGP monitoring
● Majority of outages not (widely) reported
● Remote peering and infrastructure interdependencies amplify the
impact of local incidents
● Hard evidence on outages can improve accountability, transparency
and resilience strategies
57