![Page 1: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/1.jpg)
Towards a Highly Available Internet
Tom Anderson University of Washington
Joint work with: John P. John, Ethan Katz-Bassett, Dave Choffnes, Colin Dixon, Arvind Krishnamurthy, Harsha Madhyastha, Colin Scott,
Justine Sherry, Arun Venkataramani, and David Wetherall
Financial support from: NSF, Cisco, Intel, and Google
![Page 2: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/2.jpg)
Internet-based real-time health?
Continuous Blood Glucose Monitor
Insulin Infusion Pump
Compare with trend, history
for this patient, history for
others…
Glucose Measurement
Insulin Dosage
![Page 3: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/3.jpg)
Internet Routing
Primary goal of the Internet is availability − “There is only one failure, and it is complete partition”
Clark, Design Philosophy of the Internet Protocols
Physical path => route route => efficient data path efficient data path => data flows
![Page 4: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/4.jpg)
Internet routing today
Physical path => route − 10-15% of BGP updates cause loops and inconsistent
routing tables − Loops account for 90% of all packet losses in core
Route => efficient data path − 40% of Google clients have > 400ms RTT
Efficient data path => data flows − Large scale botnets => almost every service vulnerable
to large scale Internet denial of service attacks
X
X
X
![Page 5: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/5.jpg)
Characterizing Internet Outages
Two month study: more than 2M outages
![Page 6: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/6.jpg)
Characterizing Internet Outages
Two month study: more than 2M outages
90% of outages last < 10 minutes
10% of outages account for 40% of the downtime
![Page 7: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/7.jpg)
Roadmap Brief primer on Internet routing
Interdomain routing convergence (consensus routing) − Towards high availability at a fine-grained time scale [NSDI 08]
Interdomain routing diagnosis (Hubble/reverse traceroute) − Towards high availability at a long time scale [NSDI 08, NSDI 10]
Distributed denial of service protection (phalanx) − Towards withstanding million node botnets [NSDI 08]
![Page 8: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/8.jpg)
Federation of Autonomous Networks
![Page 9: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/9.jpg)
Establishing Inter-Network Routes
Border Gateway Protocol (BGP) − Internet’s interdomain routing protocol − Network chooses path based on its own opaque policy − Forward your preferred path to neighbors
WS
L3WS
SprintL3WS
AT&TL3WS
UWAT&TL3WS
![Page 10: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/10.jpg)
BGP Paths Can Be Asymmetric
Asymmetric paths are a consequence of policy − Available paths depend on policy at other networks − Network chooses path based on its own opaque policy ($$) − Allowing policy-based decisions leads to asymmetry
UW
SprintUW
AT&TUW
L3Sprint UW WSL3SprintUW
![Page 11: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/11.jpg)
From Interdomain Path to Router-Level
Each ISP decides how to route across its network and where to hand traffic to next ISP
End-to-end depends on interdomain + intradomain − Performance and availability stem from these decisions
UWAT&TL3WS
![Page 12: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/12.jpg)
Roadmap Brief primer on Internet routing
Interdomain routing convergence (consensus routing) − Towards high availability at a fine-grained time scale [NSDI 08]
Interdomain routing diagnosis (Hubble/reverse traceroute) − Towards high availability at a long time scale [NSDI 08, NSDI 10]
Distributed denial of service protection (phalanx) − Towards withstanding million node botnets [NSDI 08]
![Page 13: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/13.jpg)
Border Gateway Protocol Key idea: opaque policy routing under local control
− Preferred routes visible to neighbors − Underlying policies are not visible
Mechanism: − ASes send their most preferred path (to each IP prefix) to
neighboring ASes − If an AS receives a new path, start using it right away − Forward the path to neighbors, with a minimum inter-
message interval • essential to prevent exponential message blowup
− Path eventually propagates in this fashion to all AS’s
![Page 14: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/14.jpg)
Failures Cause Loops in BGP
2
3
4
1
5:55:2‐4‐5
5:5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
![Page 15: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/15.jpg)
Failures Cause Loops in BGP
2
3
4
1
5:55:2‐4‐5
5:5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
LinkFailure!!4‐5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
![Page 16: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/16.jpg)
Failures Cause Loops in BGP
2
3
4
1
5:55:2‐4‐5
5:?
AS2andAS3nowswitchtonextbestpath
ArouAngloopisformedbetweenAS2andAS3!
5:4‐55:2‐4‐5
5:4‐55:3‐4‐55:1‐5
SimilarscenariocausesblackholesiniBGP
![Page 17: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/17.jpg)
Policy Changes Cause Loops in BGP
2
3
4
1
6
IfAS4withdrawsaroutefromAS2andAS3,butnotAS6,arouAngloopisformed!
OrifAS5wantstoswapitsprimary/backupproviderfrom4‐>1,or1‐>4,aloopisformed
5:4‐55:3‐4‐55:6‐4‐5 5:4‐5
5:2‐4‐55:6‐4‐5
5:4‐55:2‐4‐5
![Page 18: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/18.jpg)
The Internet as a Distributed System
BGP mixes liveness and safety: − Liveness: routes are available quickly after a change − Safety: only policy compliant routes are used
BGP achieves neither! − Messages are delayed to avoid exponential blowup − Updates are applied asynchronously, forming
temporary loops and blackholes
This is a distributed state management problem!
![Page 19: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/19.jpg)
Consensus Routing Separate concerns of liveness and safety
− Different mechanism is appropriate for each
Liveness: routing system adapts to failures quickly − Dynamically re-route around problem using known, stable
routes (e.g., with backup paths or tunnels)
Safety: forwarding tables are always consistent and policy compliant − AS’s compute and forward routes as before, including timers to
reduce message overhead − Only apply updates that have reached everywhere − Apply updates at the same time everywhere
![Page 20: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/20.jpg)
Mechanism
1
4
6 5
3
2
Periodically, a distributed snapshot
is taken Updates in transit, or being processed are
marked incomplete
1. Run BGP, but don’t apply
the updates
![Page 21: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/21.jpg)
Mechanism
1
4
6 5
3
2
1. Run BGP, but don’t apply
the updates
2. Distributed Snapshot
ASes send list of incomplete updates to the consolidators
Consolidators
![Page 22: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/22.jpg)
Mechanism
1
4
6 5
3
2
1. Run BGP, but don’t apply
the updates
2. Distributed Snapshot
3. Send info to consolidators
Consolidators run a consensus algorithm to
agree on the set of incomplete updates
Consolidators
![Page 23: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/23.jpg)
Mechanism
1
4
6 5
3
2
1. Run BGP, but don’t apply
the updates
2. Distributed Snapshot
3. Send info to consolidators
4. Consensus Consolidators flood the incomplete set to all the
ASes
Consolidators
![Page 24: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/24.jpg)
Mechanism
1
4
6 5
3
2
1. Run BGP, but don’t apply
the updates
2. Distributed Snapshot
3. Send info to consolidators
4. Consensus
5. Flood
Apply completed updates
![Page 25: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/25.jpg)
Liveness
Problem: Upon link failure, need to wait till path reaches everyone
Solution: Dynamically re-route around the failed link − Failure carrying packets (FCP) − Pre-computed backup paths − Detour routing
![Page 26: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/26.jpg)
BGP
Time
Con
nect
ivit
y
Link Failure
or other BGP event
BGP converges
to alternate path
Global
reachability
Completely
Unreachable
![Page 27: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/27.jpg)
Consensus Routing
Time
Con
nect
ivit
y
Link Failure
or other BGP event
Global
reachability
Completely
Unreachable
Switch to
transient routing Snapshot
![Page 28: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/28.jpg)
Availability After Failure
![Page 29: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/29.jpg)
BGP loops, path prepending
![Page 30: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/30.jpg)
BGP loops, prefix engineering
![Page 31: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/31.jpg)
Control traffic overhead
![Page 32: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/32.jpg)
Average delay in reaching consensus
![Page 33: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/33.jpg)
Roadmap Brief primer on Internet routing
Interdomain routing convergence (consensus routing) − Towards high availability at a fine-grained time scale [NSDI 08]
Interdomain routing diagnosis (Hubble/reverse traceroute) − Towards high availability at a long time scale [NSDI 08, NSDI 10]
Distributed denial of service protection (phalanx) − Towards withstanding million node botnets [NSDI 08]
![Page 34: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/34.jpg)
Characterizing Internet Outages
Two month study found more than 2M outages
90% of outages last < 10 minutes
10% of outages account for 40% of the downtime
![Page 35: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/35.jpg)
Current Troubleshooting: Traceroute
To troubleshoot these routing problems, network operators need better tools − Protocols do not provide much visibility − Networks do not have incentive to divulge
Traceroute: measures route from the computer running traceroute to anywhere − Provides no information about reverse path
“The number one go-to tool is traceroute.” NANOG Network operators troubleshooting tutorial, 2009.
![Page 36: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/36.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
![Page 37: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/37.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is client served by distant data center?
![Page 38: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/38.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is client served by distant data center? Check logs: No
![Page 39: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/39.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is path from data center to client indirect?
![Page 40: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/40.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is path from data center to client indirect? Traceroute: No
![Page 41: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/41.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is reverse path from client back to data center indirect?
![Page 42: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/42.jpg)
Data Centers Need Better Tools
Clients in Taiwan experiencing 500ms network latency
Is reverse path from client back to data center indirect?
“To more precisely troubleshoot problems, [Google] needs the ability to gather information about the reverse path back from clients to Google.”
[IMC 2009]
![Page 43: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/43.jpg)
Want path from D back to S, don’t control D
Technique does not require control of destination KEY IDEAS FOR REVERSE TRACEROUTE
![Page 44: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/44.jpg)
Technique does not require control of destination KEY IDEAS FOR REVERSE TRACEROUTE
Want path from D back to S, don’t control D
Can issue FORWARD traceroute from S to D But likely asymmetric
Can’t use traceroute on reverse path
![Page 45: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/45.jpg)
Want path from D back to S, don’t control D
Set of vantage points Can measure an
atlas of routes
Multiple VPs combine for view unattainable from any one KEY IDEAS FOR REVERSE TR.
![Page 46: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/46.jpg)
Traceroute from all vantage points to S
Gives atlas of paths to S; if we hit one, we know rest of path Destination-based
routing
Traceroute atlas gives baseline we bootstrap from KEY IDEAS FOR REVERSE TR.
![Page 47: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/47.jpg)
Destination-based routing Path from R1 depends only on S
Does not depend on source
Does not depend on path from D to R1
Destination-based routing lets us stitch path hop-by-hop KEY IDEAS FOR REVERSE TR.
![Page 48: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/48.jpg)
Destination-based routing lets us stitch path hop-by-hop KEY IDEAS FOR REVERSE TR.
Destination-based routing Path from R3 depends only on S
Does not depend on source
Does not depend on path from D to R3
![Page 49: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/49.jpg)
Destination-based routing lets us stitch path hop-by-hop KEY IDEAS FOR REVERSE TR.
Destination-based routing Path from R4 depends only on S
Does not depend on source
Does not depend on path from D to R4
![Page 50: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/50.jpg)
Destination-based routing lets us stitch path hop-by-hop
Traceroute atlas gives baseline we bootstrap from
KEY IDEAS FOR REVERSE TR.
Once we intersect a path in our atlas, we know rest of route
![Page 51: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/51.jpg)
Destination-based routing lets us stitch path hop-by-hop
Traceroute atlas gives baseline we bootstrap from
Segments combine to give complete path
But how do we get segments?
KEY IDEAS FOR REVERSE TR.
![Page 52: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/52.jpg)
How do we get segments?
Unlike TTL, IP Options are reflected in reply
Record Route (RR) Option Record first 9 routers
If D within 8, reverse hops fill rest of slots
IP Options work over forward and reverse path KEY IDEAS FOR REVERSE TR.
![Page 53: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/53.jpg)
How do we get segments?
Unlike TTL, IP Options are reflected in reply
Record Route (RR) Option Record first 9 routers
If D within 8, reverse hops fill rest of slots
IP Options work over forward and reverse path KEY IDEAS FOR REVERSE TR.
![Page 54: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/54.jpg)
How do we get segments?
Unlike TTL, IP Options are reflected in reply
Record Route (RR) Option Record first 9 routers
If D within 8, reverse hops fill rest of slots
… but average path is 15 hops, 30 round-trip
IP Options work over forward and reverse path KEY IDEAS FOR REVERSE TR.
![Page 55: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/55.jpg)
From vantage point within 8 hops of D, ping D spoofing as S with Record Route Option
D’s response records hop(s) on return path
Spoofing lets us use vantage point in best position
To: D Fr: S Ping? RR:__
To: D Fr: S Ping? RR: h1,…,h7
To: S Fr: D Ping! RR: h1,…,h7,D,R1
KEY IDEAS FOR REVERSE TR.
To: S Fr: D Ping! RR: h1,…,h7,D
![Page 56: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/56.jpg)
Iterate, performing spoofed Record Routes to each router we discover on return path
Spoofing lets us use vantage point in best position
Destination-based routing lets us stitch path hop-by-hop
To: R1 Fr: S Ping? RR:__
To: S Fr: R1 Ping! RR: h1,…,h6,R1,R2,R3
KEY IDEAS FOR REVERSE TR.
![Page 57: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/57.jpg)
KEY IDEAS FOR REVERSE TR. Spoofing lets us use vantage point in best position
Destination-based routing lets us stitch path hop-by-hop
What if no vantage point is within 8 hops for Record Route?
Consult atlas of known paths to find adjacencies
![Page 58: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/58.jpg)
Known paths provide set of candidate next hops KEY IDEAS FOR REVERSE TR.
What if no vantage point is within 8 hops for Record Route?
Consult atlas of known paths to find adjacencies
![Page 59: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/59.jpg)
How do we verify which possible next hop is actually on path?
IP Timestamp (TS) Option Specify ≤ 4 IPs,
each timestamps if traversed in order
To: R3 Fr: S Ping? TS: R3? R4?
To: S Fr: R3 Ping! TS: R3! R4?
To: S Fr: R3 Ping! TS: R3! R4!
KEY IDEAS FOR REVERSE TR. Known paths provide set of candidate next hops
IP Options work over forward and reverse path
![Page 60: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/60.jpg)
Destination-based routing lets us stitch path hop-by-hop KEY IDEAS FOR REVERSE TR.
![Page 61: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/61.jpg)
Once we intersect a path in our atlas, we know rest of route
KEY IDEAS FOR REVERSE TR. Destination-based routing lets us stitch path hop-by-hop
Traceroute atlas gives baseline we bootstrap from
![Page 62: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/62.jpg)
Techniques combine to give complete path
KEY IDEAS FOR REVERSE TR. Destination-based routing lets us stitch path hop-by-hop
Traceroute atlas gives baseline we bootstrap from
![Page 63: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/63.jpg)
Key Ideas For Reverse Traceroute Works without control of destination Multiple vantage points Traceroute atlas provides:
− Baseline paths − Adjacencies
Stitch path hop-by-hop IP Options work over forward and reverse path Spoofing lets us use vantage point in best position
Additional techniques to address: Accuracy: Some routers process options incorrectly Coverage: Some ISPs filter probe packets Scalability: Need to select vantage points carefully
![Page 64: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/64.jpg)
Deployment Coverage tied to set of vantage points (VPs)
Current deployment: − VPs: ~90 PlanetLab / Measurement Lab sites − Sources: PlanetLab sites − Try it at http://revtr.cs.washington.edu
![Page 65: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/65.jpg)
Evaluation Quick summary: Coverage: The combination of techniques is
necessary to get good coverage Overhead: Reasonable overhead,
10x traceroute (in terms of time, # of probes)
Next: Accuracy: Does it yield the same path as if you could
issue a traceroute from destination? − 2200 PlanetLab to PlanetLab paths − Allows comparison to direct traceroute on “reverse” path
![Page 66: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/66.jpg)
We identify most hops seen by traceroute Why we do not always see all the traceroute hops:
1. Hard to know if 2 IPs actually are the same router 2. Coverage will improve further with more vantage points
Does it give the same path as traceroute?
Median: 38% if assume symmetric
Median: 87% with our system
![Page 67: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/67.jpg)
Example of debugging inflated path
Indirectness: FLDCFL But only explains half of latency inflation
150ms round-trip time Orlando to Seattle, 2-3x expected − E.g., Content provider detects poor client performance
(Current practice) Issue traceroute, check if indirect
![Page 68: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/68.jpg)
Example of debugging inflated path
Indirectness: WA LAWA Bad reverse path causes inflated round-trip delay
(Current practice) Issue traceroute, check if indirect − Does not fully explain inflated latency
(Our tool) Use reverse traceroute to check reverse path
![Page 69: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/69.jpg)
Operators Struggle to Locate Failures
Mailing List User 1 1 Home router 2 Verizon in Baltimore 3 Verizon in Philly 4 Alter.net in DC 5 Level3 in DC 6 * * * 7 * * *
Mailing List User 2 1 Home router 2 Verizon in DC 3 Alter.net in DC 4 Level3 in DC 5 Level3 in Chicago 6 Level3 in Denver 7 * * * 8 * * *
“Traffic attempting to pass through Level3's network in the Washington, DC area is getting lost in the abyss. Here's a trace from Verizon residential to Level3.”
Outages mailing list, December 2010
![Page 70: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/70.jpg)
How Can We Locate a Problem?
Group paths
We have:
Fwd/rev traceroute
Current paths
Historic atlas
![Page 71: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/71.jpg)
How Can We Locate a Problem?
Group paths – Looks like Cox failure, but: − Failure could be on reverse path − Cannot tell which ISP is responsible, as paths may be
asymmetric
We have:
Fwd/rev traceroute
Current paths
Historic atlas
![Page 72: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/72.jpg)
How Can We Locate a Problem?
Group paths Use Reverse Traceroute to isolate direction
− Also lets us measure working direction
Fr: Z To: D Ping?
Fr: D To: Z Ping!
Fr: Z To: D Ping?
Fr: D To: Z Ping!
We have:
Fwd/rev traceroute
Current paths
Historic atlas
![Page 73: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/73.jpg)
How Can We Locate a Problem?
Group paths Use Reverse Traceroute to isolate direction Use historic atlas to reason about what changed
We have:
Fwd/rev traceroute
Current paths
Historic atlas
R
![Page 74: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/74.jpg)
Partial Outages: An Opportunity Initial version of isolation system running
continuously. Preliminary results:
Working routes exist, even during failures − 68% of black holes are partial
• Paths from some vantage points fail, others work
− Can’t be explained by hardware failure: misconfiguration or result of policy
− 69% are one-way failures, other direction work
![Page 75: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/75.jpg)
Self-Repair of Forward Paths
Straightforward: Choose a different path or data center.
![Page 76: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/76.jpg)
Ideal Self-Repair of Reverse Paths
Don’t use ATT
Don’t use ATT
Don’t use ATT
We want a way to signal to ISPs which networks to avoid.
![Page 77: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/77.jpg)
Practical Self-Repair of Reverse Paths
Use BGP loop prevention to force switch to working path.
WS
L3WS
UWATTL3WS
SprintQwestWS
AISPQwestWS
UWSprintQwestWSATT
SprintQwestWSATT
AISPQwestWSATT
?
L3WSATT
WSATT
ATTL3WS
QwestWS QwestWSATT
![Page 78: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/78.jpg)
Remediation Goals Without control of the network causing a failure,
automatically reroute traffic in a way that is: Effective: Allows networks to avoid failure Non-disruptive: Little effect on working paths Predictable: Understandable effect, and reverts
when no longer needed
BGP loop-prevention as our basic mechanism, with:
Proposed techniques for each of 3 properties Experiments in progress
![Page 79: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/79.jpg)
Summary Substantial improvements in Internet availability are both
needed, and possible
Interdomain routing convergence (consensus routing) − Towards high availability at a fine-grained time scale
Interdomain routing diagnosis (Hubble/reverse traceroute) − Towards high availability at a long time scale
Distributed denial of service protection (phalanx) − Towards withstanding million node botnets
![Page 80: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/80.jpg)
Final Thought
“A good network is one that I never have to think about” – Greg Minshall
![Page 81: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/81.jpg)
Botnets are Big Botnet: Group of infected computers controlled by a hacker
to launch various attacks − Infected via viruses, trojans and worms − Botnets patch the vulnerability to let the hacker maintain control − Self-sustaining economy in attack technologies
Total bots: − 6 million [Symantec] − 150 million [Vint Cerf]
Single botnets have numbered 1.5 million Back of the envelope: 4.5 Tb/s attack possible today
− If average bot matches bittorrent distribution
![Page 82: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/82.jpg)
Plenty of Vulnerabilities
![Page 83: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/83.jpg)
Solution Space Many research proposals for in-network changes
(traceback, pushback, AITF, TVA, SIFF, NIDS, …) − But a million node botnet => need near complete deployment − Plus a terabit/sec can overwhelm any NIDS
For read-only data, Akamai is an effective solution − Put a copy of the data on every Akamai node − Works today for most US government web sites
Many services aren’t read-only: − Estonia (egovt), IRS e-filing, Amazon, eBay, Skype, etc.
What if we had a swarm for this case?
![Page 84: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/84.jpg)
84
Single Mailbox
Mailbox queues packet until destination explicitly requests it
![Page 85: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/85.jpg)
85
Single Mailbox If the botnet can discover the mailbox, game over
![Page 86: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/86.jpg)
86
Many Mailboxes Source sends packets through a random sequence of mailboxes Sequence known to destination, but not to attacker
![Page 87: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/87.jpg)
87
Many Mailboxes Source sends packets through a random sequence of mailboxes Sequence known to destination, but not to attacker Botnet can take down one mailbox
![Page 88: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/88.jpg)
88
Many Mailboxes Source sends packets through a random sequence of mailboxes Sequence known to destination, but not to attacker Botnet can take down one mailbox But communication continues
![Page 89: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/89.jpg)
89
Many Mailboxes Source sends packets through a random sequence of mailboxes Sequence known to destination, but not to attacker Botnet can take down one mailbox But communication continues Diluted attacks against all mailboxes fail
![Page 90: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/90.jpg)
90
Why not just attack the server?
![Page 91: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/91.jpg)
Filtering Ring Each request has a nonce Exit router keeps a list of requests Drop all incoming pkts without the nonce Remove the nonce once used Efficient implementation using bloom filters
Attack needs to flood all border routers of an ISP to be effective
![Page 92: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/92.jpg)
Phalanx Example
![Page 93: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/93.jpg)
Phalanx Latency Penalty
![Page 94: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/94.jpg)
Phalanx vs. In Network Solutions
![Page 95: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/95.jpg)
Phalanx Scalability
![Page 96: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/96.jpg)
Measuring Link Latency
Many applications want link latencies − IP geolocation, ISP performance, performance prediction, …
Traditional approach is to assume symmetry: Delay(A,B) = ( RTT(S,B) – RTT(S,A) ) / 2
Asymmetry skews link latency inferred with traceroute
![Page 97: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/97.jpg)
Reverse Traceroute Detects Symmetry
Reverse traceroute identifies symmetric traversal − Identify cases when RTT difference is accurate − We can determine latency of (S,A) and (S,C)
Solved (S,A) (S,C)
![Page 98: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/98.jpg)
Reverse TR Constrains Link Latencies
Build up system of constraints on link latencies of all intermediate hops
− Traceroute and reverse traceroute to all hops − RTT = Forward links + Reverse links
Solved (S,A) (S,C)
![Page 99: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/99.jpg)
Reverse TR Constrains Link Latencies
Build up system of constraints on link latencies of all intermediate hops
− Traceroute and reverse traceroute to all hops − RTT = Forward links + Reverse links
Solved (S,A) (S,C) (V,B) (B,C) (A,B)
![Page 100: Towards a Highly Available Internethomes.cs.washington.edu/~tom/support/reliability.pdf · Internet Routing Primary goal of the Internet is availability − “There is only one failure,](https://reader035.vdocument.in/reader035/viewer/2022071117/6002359515dd987dc6790a68/html5/thumbnails/100.jpg)
Case Study: Sprint Link Latencies
Reverse traceroute sees 79 of 89 inter-PoP links, whereas traceroute only sees 61 Median (0.4ms), mean (0.6ms), worst case (2.2ms) error all 10x better than with traditional approach