minimizing probing cost for detecting interface failures: algorithms and scalability analysis hung...

16
for Detecting Interface for Detecting Interface Failures: Algorithms Failures: Algorithms and Scalability and Scalability Analysis Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira (UPMC, France) Patrick Thiran (EPFL, Switzerland) Christophe Diot (Thomson, France)

Upload: dominick-medlen

Post on 31-Mar-2015

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Minimizing Probing Cost for Minimizing Probing Cost for Detecting Interface Failures: Detecting Interface Failures: Algorithms and Scalability Algorithms and Scalability AnalysisAnalysis

Hung Nguyen (Univ. of Adelaide, Australia)Renata Teixeira (UPMC, France)Patrick Thiran (EPFL, Switzerland)Christophe Diot (Thomson, France)

Page 2: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

The Internet is great, but The Internet is great, but problems happenproblems happen

UoAnetwork

Net1Net2

Net3

How to automatically detect and identify problems?

Is my connection ok?

Is the server up?

Is the problem in some of the networks in the

path?

129.130.42.3

Page 3: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Current alarms are not Current alarms are not enoughenough

Network equipments already have many alarms◦ SNMP traps◦ Anomaly detection systems

But, alarms may not reflect user’s experience◦ Hard to map users’ complaints to alarms◦ Problem may not raise an alarm

A C

BD

129.130.42.3

13.110.42.5

C wrongly filters packets to 129.130.42.3/24

Page 4: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Active monitoring system to Active monitoring system to detect faultsdetect faultsNetwork admins often resort to

active measurements◦Active monitoring servers inside

their network◦Subscribe to third-party monitoring

service e.g. ,Keynote or RIPE TTMChallenge

Cannot continuously overload the network or end-user’s machine to detect faults, which are rare events

Page 5: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Problem definitionProblem definition

M1

M2

T3

T1 T2

A C

BD

target hosts

monitors

Goal detect failures of any of the interfaces in the

subscriber’s network with minimum probing overhead

subscriber network

Page 6: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Simple solution: Coverage Simple solution: Coverage problem problem

M1

M2

T3

T1 T2

A C

BD

Instead of probing all paths, select the minimum set of paths that covers

all interfaces in the subscriber’s network

Page 7: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Coverage solution doesn’t Coverage solution doesn’t detect all types of failuresdetect all types of failuresDetects full-stop failures

◦Failures that affect all packets that traverse the faulty interface Eg., interface or router crashes, fiber

cuts, bugs

But not path-specific failures◦Failures that affect only a subset of

paths that cross the faulty interface Eg., router misconfigurations

Page 8: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

New formulation of failure New formulation of failure detection problemdetection problem

Simultaneously select the frequency to probe each path◦Lower frequency per-path probing can

achieve a high frequency probing of each interface

M1

M2

T3

T1 T2

A C

BD

1 every 9 mins

1 every 3 mins

Page 9: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Properties of solutionProperties of solutionProbe minimization for failure detection is no

longer NP-hard◦ Can find optimal solution using linear programming

Needs synchronization among monitors◦ Monitors need to collaborate to probe an interface

• Alternative probabilistic solution with Poisson probes to avoids synchronization overhead

M1

M2

T3

T1 T2

A C

BD

1 every 9 mins

1 every 3 mins

Page 10: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Scaling law of probing Scaling law of probing costcostProbing cost (number of probes sent per

second) scales almost linearly with the size of the subscriber’s network ◦ In our inferred internet graphs

For a random power-law graph, probing cost is a linear function of the number of nodes (n)

Bounded by the isometric path number of a graph, i(G)

For other graphs:Graph i(G)

Cycle 2n/(n+1)

Complete n/2

Hypercube n/log n

Grid n/2

Page 11: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

EvaluationEvaluation Paths obtained using traceroutes

◦ From 750 PlanetLab nodes to 3,000 DNS servers◦ From 12 RON nodes to 60,000 targets

Subscriber networks are probed ASes ◦ Map IPs to ASes using Mao et al.’s technique◦ 1,366 ASes in PlanetLab◦ 6,517 ASes in RON

Compute probing costs varying parameters◦ Set of paths, failure durations, subscriber’s network

Page 12: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Probing costs varying size of Probing costs varying size of subscriber network in subscriber network in PlanetLabPlanetLab

DurationPath-specific = 1000

secFull-stop duration = 1

sec

Page 13: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

SummarySummary Practical formulation of failure detection problem

◦ Incorporates both full-stop and path-specific failures Solution minimizes probing cost

◦ Using linear programming Inferred internet graphs are among the most

expensive to probe◦ Probing cost scales almost linearly with network

size Next step

◦ Deploy a system based on these probing techniques

Page 14: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Probing costsProbing costsDuration

Path-specific = 2 secFull-stop duration = 1

sec

Page 15: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Varying Failure DurationsVarying Failure DurationsFull-stop duration = 10

sec

Path-specific failures dominate the cost

Full-stop failures dominate the cost

Page 16: Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira

Probing costs varying size of Probing costs varying size of subscriber network in RONsubscriber network in RON

DurationPath-specific = 1000

secFull-stop duration = 1

sec