a case study in understanding ospfv2 and bgp4 interactions using efficient experiment design david...

13
Understanding OSPFv2 Understanding OSPFv2 and BGP4 and BGP4 Interactions Using Interactions Using Efficient Experiment Efficient Experiment Design Design David Bauer David Bauer† , Murat Yuksel , Murat Yuksel‡ , Christopher Carothers , Christopher Carothers† and Shivkumar Kalyanaraman and Shivkumar Kalyanaraman‡ Department of Computer Science Department of Computer Science Department of Electrical, Computer and Systems Department of Electrical, Computer and Systems Engineering Engineering Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute

Upload: evan-phillips

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

A Case Study in A Case Study in Understanding OSPFv2 Understanding OSPFv2 and BGP4 Interactions and BGP4 Interactions

Using Efficient Using Efficient Experiment Design Experiment Design David BauerDavid Bauer††, Murat Yuksel, Murat Yuksel‡‡, Christopher Carothers, Christopher Carothers†† and and

Shivkumar KalyanaramanShivkumar Kalyanaraman‡‡

††Department of Computer ScienceDepartment of Computer Science‡‡Department of Electrical, Computer and Systems Department of Electrical, Computer and Systems

EngineeringEngineering

Rensselaer Polytechnic InstituteRensselaer Polytechnic Institute

Problem StatementProblem Statement

Computational Complexity

Models: BGP4, OSPFv2, TCP-Reno, IPv4

Design ComplexityParameter Space: fixed inputs, protocol timers, decision algorithm

Highly

Detailed

Models

ROSS.Net built and utilized to address both parts of the ROSS.Net built and utilized to address both parts of the problemproblem Goal: “good results fast” leading to an understanding of Goal: “good results fast” leading to an understanding of the system under test (make sense of the results)the system under test (make sense of the results)

BG

P4

Response SurfaceResponse SurfaceO

SPFv

2

Understand protocol interactions Understand protocol interactions through UPDATE messages through UPDATE messages generated by and between protocolsgenerated by and between protocols

OO: OSPF caused OSPF UpdatesOO: OSPF caused OSPF Updates BO: BGP caused OSPF UpdatesBO: BGP caused OSPF Updates

BB: BGP caused BGP UpdatesBB: BGP caused BGP Updates OB: OSPF caused BGP UpdatesOB: OSPF caused BGP Updates

INTERACTION

INTERACTION

Why Are Feature Interactions Why Are Feature Interactions Harmful?Harmful?

Network protocol weaknesses are not fully Network protocol weaknesses are not fully understand until implemented / simulated understand until implemented / simulated in the large-scalein the large-scale

Are decisions made to efficiently route data Are decisions made to efficiently route data withinwithin a domain adversely affecting our a domain adversely affecting our ability to efficiently route data ability to efficiently route data acrossacross the the domain?domain?

Hot-potato routing: small degree of unstable Hot-potato routing: small degree of unstable information affects large portion of trafficinformation affects large portion of traffic

Cold potato routingCold potato routing

AS 0 AS 1 AS 2

Local Policy: optimize routing within AS (OSPFv2)Local Policy: optimize routing between ASes (BGP4)Global Policy: optimize routing within and between ASes

Large-scale SimulationLarge-scale Simulation Topology from Rocketfuel dataTopology from Rocketfuel data Network Hierarchy:Network Hierarchy:

– Level 0 routers: Level 0 routers: 9.92 Gb/sec and 1 ms delay9.92 Gb/sec and 1 ms delay– Level 1 routers: Level 1 routers: 2.48 Gb/sec and 2 ms delay2.48 Gb/sec and 2 ms delay– Level 2 routers: Level 2 routers: 620 Mb/sec and 3 ms delay620 Mb/sec and 3 ms delay– Level 3 routers: Level 3 routers: 155 Mb/sec and 50 ms delay155 Mb/sec and 50 ms delay– Level 4 routers: Level 4 routers: 45 Mb/sec and 50 ms delay45 Mb/sec and 50 ms delay– Level 5 routers and below: Level 5 routers and below: 1.55 Mb/sec and 1.55 Mb/sec and

50 ms delay50 ms delay

LEVEL 3: AS 3356iBGP: 7,921 eBGP: 210

OSPFv2:Routers: 2,064Links: 8,669

Tiscali: AS 3257iBGP: 441 eBGP:

OSPFv2:Routers: 618Links: 839

EBONE: AS 1755iBGP: 16,384

OSPFv2:Routers: 438Links: 1,192

EXODUS: AS 3967iBGP: 50,176 eBGP: 53

OSPFv2:Routers: 688Links: 2,166

ABOVENET: AS 6461iBGP: 2,500 eBGP: 199

OPSFv2: Routers: 843Links: 2,667

8

18

12

161

6

9

26

12

11

12

Experiment Design and Experiment Design and AnalysisAnalysis

Three classes of protocol Three classes of protocol parameters:parameters:– OSPF timers, BGP timers, BGP OSPF timers, BGP timers, BGP

decisiondecision

RRS was allowed 200 trials to RRS was allowed 200 trials to optimize (minimize) response optimize (minimize) response surfacesurface– Heuristic search algorithmHeuristic search algorithm

Applied multiple linear Applied multiple linear regression regression analysis analysis on the resultson the results

Response PlaneResponse Plane

Intra-domain routing decisions can Intra-domain routing decisions can effect inter-domain behavior, and effect inter-domain behavior, and vice versa.vice versa.

All updates belong to either of four All updates belong to either of four categories:categories:– OSPF-caused OSPF (OO) updateOSPF-caused OSPF (OO) update– OSPF-caused BGP (OB) update – interactionOSPF-caused BGP (OB) update – interaction– BGP-caused OSPF (BO) update – interactionBGP-caused OSPF (BO) update – interaction– BGP-caused BGP (BB) updateBGP-caused BGP (BB) update

Destination

OB Update

8 10

Link failure or cost increase (e.g. maintenance)

Intra-domain routing decisions can Intra-domain routing decisions can effect inter-domain behavior, and effect inter-domain behavior, and vice versa.vice versa.

All updates belong to either of four All updates belong to either of four categories:categories:– OSPF-caused OSPF (OO) updateOSPF-caused OSPF (OO) update– OSPF-caused BGP (OB) updateOSPF-caused BGP (OB) update– BGP-caused OSPF (BO) updateBGP-caused OSPF (BO) update– BGP-caused BGP (BB) updateBGP-caused BGP (BB) update

Response PlaneResponse Plane

eBGP connectivity

becomes available

Destination

BO Update

These interactions cause route changes to thousands of IP prefixes, i.e. huge traffic shifts!!

High Level CharacterizationHigh Level Characterization

Optimized with respect to Optimized with respect to OB+BO OB+BO response surface.response surface. BGP timersBGP timers play the major role, i.e. ~15% improvement in the play the major role, i.e. ~15% improvement in the

optimal response.optimal response.– BGP KeepAlive timer seems to be the dominant parameter.. – in BGP KeepAlive timer seems to be the dominant parameter.. – in

contrast to expectation of MRAI!contrast to expectation of MRAI! OSPF timers effect little, i.e. at most 5%.OSPF timers effect little, i.e. at most 5%.

– low time-scale OSPF updates do not effect BGP. low time-scale OSPF updates do not effect BGP.

~15% improvement when BGP timers included in search space

Design 1: Mgt PerspectivesDesign 1: Mgt Perspectives

Varied response surfaces -- Varied response surfaces -- equivalent to a particular management approach.equivalent to a particular management approach. Importance of parameters differ for each metric.Importance of parameters differ for each metric. For minimal total updates:For minimal total updates:

– Local perspectives are 20-25% worse than the global.Local perspectives are 20-25% worse than the global. For minimal total interactions:For minimal total interactions:

– 15-25% worse can happen with other metrics15-25% worse can happen with other metrics OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%)OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%)

Important to optimize OSPFImportant to optimize OSPFImportant to optimize OSPFImportant to optimize OSPF

OB: ~50% of total updates

BO: ~0.1% of total updates

Global perspective 20-25% better than local perspectives

Minimize total BO+OB 15-25% better than other metrics

Q: Can we use this approach to Q: Can we use this approach to provide guidance for network routing provide guidance for network routing policies?policies?

Performed full factorial of RRS Performed full factorial of RRS searches, turning Hot-, Cold-potato searches, turning Hot-, Cold-potato routing ON/OFFrouting ON/OFF

Provide quantitative results from Provide quantitative results from which qualitative stmts can be madewhich qualitative stmts can be made

Verified AT&T and Sprint Verified AT&T and Sprint measurementsmeasurements

Design 2: Hot- v Cold-Potato Design 2: Hot- v Cold-Potato RoutingRouting

No major impact regardless of search performed

Majority of UPDATEs were generated by LOCAL-Pref and AS Path length

MED was << 1% of UPDATEs

Hot Potato was 0.8%

Larger question: Larger question: Which steps in Which steps in the BGP decision making the BGP decision making algorithm are most important?algorithm are most important?

Design 3: Network Design 3: Network RobustnessRobustness

Q: Can we use this approach to provide network Q: Can we use this approach to provide network admins with guidance for network configurations?admins with guidance for network configurations?

Link status varied with uniform random probability Link status varied with uniform random probability over simulation runtimeover simulation runtime

Link weights varied with uniform random Link weights varied with uniform random probability over simulation runtimeprobability over simulation runtime

Response: BO + OB, Global Persp, and Default Response: BO + OB, Global Persp, and Default network settingsnetwork settings

Search consistently provides better resultsSearch consistently provides better results

Response tied to link stability

BGP parameters had greatest impact

By maximizing link failure detection times, UPDATEs most effectively minimized

ConclusionsConclusions

– Number of experiments were reduced by many orders of Number of experiments were reduced by many orders of magnitude in comparison to Full Factorial magnitude in comparison to Full Factorial

– Experiment design and statistical analysis enabled rapid Experiment design and statistical analysis enabled rapid elimination of insignificant parameterselimination of insignificant parameters

– Several qualitative statements and system characterizations Several qualitative statements and system characterizations could be obtained with few experiments.could be obtained with few experiments.

– Provided validation of network measurement community Provided validation of network measurement community results, and called into question importance of premisesresults, and called into question importance of premises

– Search algorithms do not always find desired behaviourSearch algorithms do not always find desired behaviour

! Allowed me to complete my thesis and graduate!Allowed me to complete my thesis and graduate!