1/21 evaluating potential routing diversity for internet failure recovery *chengchen hu, + kai chen,...

32
1/21 Evaluating Potential Evaluating Potential Routing Diversity for Routing Diversity for Internet Failure Recovery Internet Failure Recovery *Chengchen Hu, *Chengchen Hu, + Kai Chen, Kai Chen, + Yan Chen, Yan Chen, *Bin Liu *Bin Liu *Tsinghua University, *Tsinghua University, + Northwestern University Northwestern University

Upload: rhoda-jacobs

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

1/21

Evaluating Potential Routing Evaluating Potential Routing Diversity for Internet Failure Diversity for Internet Failure

RecoveryRecovery

*Chengchen Hu, *Chengchen Hu, ++Kai Chen, Kai Chen, ++Yan Chen, *Bin Yan Chen, *Bin LiuLiu

*Tsinghua University,*Tsinghua University,++Northwestern UniversityNorthwestern University

Page 2: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

2/21

Internet FailuresFailure is part of everyday life in IP networks

e.g., 675,000 excavation accidents in 2004 [Common Ground Alliance]

Network cable cuts every few days …

Real-world emergencies or disasters can lead to substantial Internet disruptionEarthquakesStormsTerrorist incident: 9.11 event…

Page 3: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

3/21

Example: Taiwan earthquake incident

Large earthquakes hit south of Taiwan on 26 December 2006

Only two of nine cross-sea cables not affected

There are abundant physical level connectivity there, but the it took too long for ISPs to find them and use them.

Page 3

figures cited from "Aftershocks from the Taiwan Earthquakes: Shaking up Internet transit in Asia, NANOG42"

Page 4: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

4/21

How reliable the Internet is?Internet is not as reliable as people expected!

[Wu, CoNEXT’07]32% ASes are vulnerable to a single critical

customer-provider link cut93.7% Tier-1 ISP’s single-homed customers are lost

from the peered ISP due to Tier-1 depeering

Our question: can we find more resources to increase the Internet reliability especially when Internet emergency happens?

Page 5: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

5/21

Basic IdeaTwo places where we can find more routing

diversities:Internet eXchange Points (IXPs)

Co-location where multiple ASes exchange their trafficParticipant ASes in an IXP may not be connected via BGP

Internet valley-free routing policyAS relationships: customer-provider, peering, siblingPeering relaxation (PR): allow one AS to carry traffic from

the other to its providerMentioned in [Wu, CoNEXT’07], but without evaluation

Our main focus: How much can we gain from these two potential

resources, i.e., IXP and PR?

Page 6: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

6/21

Dataset for EvaluationMost complete AS topology graph

BGP data Route Views, RIPE/RIS, Abilene, CERNET BGP View

P2P tracerouteTraceroute data from 992, 000 IPs in over 3, 700 ASes

In total, 120K AS links with AS relationshipshttp://aqualab.cs.northwestern.edu/projects/

SidewalkEnds.html [Chen et al, CoNEXT’09]

IXP dataPCH + Peeringdb + Euro-IX (~200 IXPs)3468 participant ASes

Page 7: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

7/21

Failure ModelsTier-1 depeering

Real example: Cogent and Level3 depeering

Tier-1 provider-customer link teardownReported in NANOG forum

Mixed types of link breakdown9.11 event, Taiwan earthquakes, 2003 Northeast

blackout

Page 8: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

8/21

Evaluation MetricsRecovery Ratio

# of recovered <src-dst> AS pairs versus total # of affected <src-dst> AS pairs

Path Diversity# of increased link-disjoint AS paths between

affected <src-dst> AS pairs

Shifted Path# of link-disjoint AS paths shifted onto a normal

link after we use IXP or PR resources

Page 9: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

9/21

Results: Tier-1 Depeering36 experiments for 9 Tier-1 ASesRecovery ratio: most of the lost AS pairs can

be recovered

Page 10: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

10/21

Results: Tier-1 DepeeringPath diversity: multiple AS paths between lost

AS pairs

Page 11: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

11/21

Results: Tier-1 DepeeringShifted path

On average, 3.75 ~ 17.2 for all 36 experimentsModerate traffic load shifted onto the unaffected

links

Page 12: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

12/21

Economic modelB pays to A for recovery

Business modelRisk alliance (like airlines): price is determined

beforehandpay on bandwidth & duration or bits (95

percentile)

A Bpeer

A BP-C

A BP-C

A BIXP

Page 13: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

13/21

Communication channel Search for peers

Have direct connections to peers

Search for co-located ASes in the same IXPASes are connected by switches in modern IXPsMessages are broadcasted with the help of the

switchesMessage confidentiality with public key crypto

Page 14: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

14/21

Automatic communication Query message (failed AS)

who connected to specific destination ASes

Reply message (surviving AS)I can provide BW1 bandwidth to the destination AS

ACK (failed AS)I would like buy BW2 (<=BW1)

Set up BGP sessionsWithdraw BGP sessions

Page 15: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

15/21

Check available connectivity & bandwidthConnectivity

traceroute

Available bandwidthMaximum capacity is already knownEstimate the amount which has been used

Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, “Fast Accurate Computation of Large-Scale IP Traffic Matrices from Link Loads,” ACM SIGMETRICS, 2003.

Subtract

Page 16: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

16/21

Optimal selection of helper ISPsFrom a single victim ISP perspective

Buy transit from a minimal number of ASesRecover all the (prioritized) traffic Least cost

Page 17: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

17/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi}

is how much bandwidth AS j could provide to Di;ijx

Page 18: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

18/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi}

Score each (helper) AS j with Select the AS with largest score (select the one with lowest price if same score)

min( / ,1)ij iix B

3 2.3

5 2.1

Page 19: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

19/21

Selection heuristic Update Lost connectivity to {Di}, with bandwidth demand {Bi}

updated

Page 20: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

20/21

Selection heuristic rescore and selectLost connectivity to {Di},

with bandwidth demand {Bi}

1 0.3

0.10

Page 21: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

21/21

SummaryFirst work to evaluate the potential routing

diversity via IXP and PR with the most complete AS topology graph.

40%-80% of affected <Src, Dst> AS pairs can be recovered via IXP and PR with multiple paths and moderate shifted paths.

Point out a new venue for Internet failure recovery.Possible and practical mechanisms to utilize

potential routing diversity.

Look forward to feedback and collaborations from IXP/ISPs!

Page 22: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

22/21

Thank you!Thank you!

Q&AQ&A

Page 23: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

23/21

BackupBackup

Page 24: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

24/21

Failure ModelsTier-1 depeering

Real example: Cogent and Level3 depeering

Tier-1 provider-customer link teardownReported in NANOG forum

Mixed types of link breakdown9.11 event, Taiwan earthquakes, 2003 Northeast

blackout

Page 25: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

25/21

Results: Tier-1 provider-customer links teardown Recovery ratio

Path diversity4.64 for 10 Tier-1 provider-customer links teardown4.54 for 20 Tier-1 provider-customer links teardown

Shifted pathThe average number of shifted path when 10, 20 and 30

links are damaged are 3.4, 4.0 and 4.2, respectively.

Page 26: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

26/21

Results: Mixed types of links breakdownTaiwan earthquake, 9 big victim ASesRecovery ratio

Page 27: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

27/21

Results: Mixed types of links breakdownPath diversity

Page 28: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

28/21

Results: Mixed types of links breakdownShifted path

Page 29: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

29/21

System framework Adding an Emergency Recovery (ER) module in a

router’s control plane Setting up the communications between ER and the

Intra-TE Resource Management modules.

Page 30: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

30/21

Building communication channelAn example

Page 31: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

31/21

Optimal selection of ISPs to helpFrom global view

Min. shift path or tuned AS-linksst. recover all the (prioritized) traffic we could

or

Max. recovery ratiost. shift path or tuned AS-links

From a single ISPMin. cost for the ISPst. recover all the (prioritized) traffic we could

or

Max. recovery ratiost. cost for the ISP

Page 32: 1/21 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern

32/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi} is how much bandwidth AS j could provide to Di;Score each (helper) AS j with Select the helper AS with largest score (select the

one with lowest price if same score)Update {Di} by deleting the recovered AS Update {Bi} by subtracting the recovered

bandwidthrescore and select the next helper ASIteration till all are recovered

min( / ,1)ij iix B

ijx