towards unbiased end-to-end diagnosis

Post on 01-Jan-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Towards Unbiased End-to-End Diagnosis. Yao Zhao 1 , Yan Chen 1 , David Bindel 2. Lab for Internet & Security Tech, Northwestern Univ EECS department, UC Berkeley. Outline. Background and Motivation MILS in Undirected Graph MILS in Directed Graph Evaluation Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Yao Zhao1, Yan Chen1, David Bindel2

Towards Unbiased End-to-End Diagnosis

1. Lab for Internet & Security Tech, Northwestern Univ

2. EECS department, UC Berkeley

2

Outline

• Background and Motivation

• MILS in Undirected Graph

• MILS in Directed Graph

• Evaluation

• Conclusions

3

End-to-End Network Diagnosis

93 hours

?

4

Linear Algebraic Model

Path loss rate pi, link loss rate lj:

)1)(1(1 211 llp

1

3

2

1

011 b

x

x

x

A

D

C

B

1

2

3p1

p2

)1log()1log()1log( 211 llp

)1log(

)1log(

)1log(

011

3

2

1

l

l

l

2

1

3

2

1

111

011

b

b

x

x

x

Usually an underconstrained syste

m

5

Unidentifiable Links

• Vectors That Are Linear Combinations of Row Vectors of G Are Identifiable– The property of a link (or link sequence) can

be computed from the linear system if and only if the corresponding vector is identifiable

• Otherwise, Unidentifiable

111

011G

(1) 121 bxx

(2) 2321 bxxx (1)-(2) 123 bbx

A

D

C

B

1

2

3p1

p2 [ 0 0 1 ]

[ 1 0 0 ] ? ?1 x

6

Virtual Link

Motivation

• Biased statistic assumptions are introduced to infer unidentifiable Links

0.1

0.1

0

Loss rate = 0.1 if linear optimization

Loss = 0 if unicast tomography & RED

Loss rate?

7

Least-biased End-to-end Network Diagnosis (LEND)

• Basic Assumptions– End-to-end measurement can infer the end-to-

end properties accurately– Link level properties are independent

• Problem Formulation– Given end-to-end measurements, what is the

finest granularity of link properties can we achieve under basic assumptions?

Basic assumptions

More and stronger statistic assumptions

Virtual linkDiagnosis granularity?

Better accuracy

8

Least-biased End-to-end Network Diagnosis (LEND)

• Contributions– Define the minimal identifiable unit under basic

assumptions (MILS)– Prove that only E2E paths are MILS with a

directed graph topology (e.g., the Internet) – Propose good path algorithm (incorporating

measurement path properties) for finer MILS

Basic assumptions

More and stronger statistic assumptions

Virtual linkDiagnosis granularity?

Better accuracy

9

Outline

• Background and Motivation

• MILS in Undirected Graph

• MILS in Directed Graph

• Evaluation

• Conclusions

10

Minimal Identifiable Link Sequence

• Definition of MILS– The smallest path segments with loss rates t

hat can be uniquely identified through end-to-end path measurements

– Related to the sparse basis problem• NP-hard Problem

• Properties of MILS– The MILS is a consecutive sequence of link

s– A MILS cannot be split into MILSes (minima

l)– MILSes may be linearly dependent, or some

MILSes may contain other MILSes

11

Examples of MILSes in Undirected Graph

Real links (solid) and all of the overlay paths (dotted) traversing them

1

231’

2’

3’4’

4

5

4

3

2

1

11000

01101

10110

00011

v

v

v

v

G

MILSes

a

b

c

de

3’+2’-1’-4’ → link 3

4132 vvvv

001002

12

Outline

• Background and Motivation

• MILS in Undirected Graph

• MILS in Directed Graph

• Evaluation

• Conclusions

13

Identify MILSes in Undirected Graphs

• Preparation– Active or passive end-to-end path measure

ment– Optimization

• Measure O(nlogn) paths and infer the n(n-1) end-to-end paths [SIGCOMM04]

14

• Preparation• Identify MILSes

– Enumerate each link sequence to see if it is identifiable

– Computational complexity: O(r×k×l2)• r: the number of paths (O(n2))• k: the rank of G (O(nlogn))• l: the length of the paths

– Only takes 4.2 seconds for the network with 135 Planetlab hosts and 18,090 Internet paths

Identify MILSes in Undirected Graphs

15

What about Directed Graphs?

• Directed Graph Are Essentially Different to Undirected Graph

Theorem: In a directed graph, no end-to-end path contains an identifiable subpath if only considering topology information

A

B C

N

A

B C

N

IncomingLinks

OutgoingLinks

1

23

4

6 5

010100

001100

100010

001010

100001

010001654321

G

[1 0 0 0 0 0] ?

Sum=1 Sum=1Sum=1 Sum=1

Sum=1 Sum=0

16

Good Path Algorithm

• Consider Only Topology– Works for undirected graph

• Incorporate Measurement Path Property– Most paths have no loss

• PlanetLab experiments show 50% of paths in the Internet have no loss

– All the links in a path of no loss are good links (Good Path Algorithm)

17

Good Path Algorithm

A

B C

N

A

B C

N

IncomingLinks

OutgoingLinks

1

23

4

6 5

010100

001100

100010

001010

100001

010001654321

G

• Symmetric Property is broken when using good path algorithm

18

Other Features of LEND

• Dynamic Update for Topology and Link Property Changes– End hosts join or leave, routing changes or pa

th property changes– Incremental update algorithms very efficient

• Combine with Statistical Diagnosis– Inference with MILSes is equivalent to inferen

ce with the whole end-to-end paths– Reduce computational complexity because MI

LSes are shorter than paths• Example: applying statistical tomography methods i

n [Infocom03] on MILSes is 5x faster than on paths

19

Outline

• Motivation

• MILS in Undirected Graph

• MILS in Directed Graph

• Evaluation

• Conclusions

20

Evaluation Metrics

• Diagnosis Granularity– Average length of all the lossy MILSes in lossy p

ath

• Accuracy– Simulations

• Absolute error and relative error

– Internet experiments• Cross validation • IP spoof based consistency check

• Speed– Running time for finding all MILSes and loss rat

e inference

21

Methodology

• Planetlab Testbed– 135 end hosts, each from different institute – 18,090 end-to-end paths

• Topology Measured by Traceroute– Avg path length is 15.2

• Path Loss Rate by Active UDP Probing with Small Overhead

Areas and Domains # of hosts

US (77)

.edu 50.org 14.net 2.com 10.us 1

Inter- national (58)

Europe 25Asia 25

Canada 3South America 3

Australia 2

22

Diagnosis Granularity

# of End-to-end Paths 18,090

Avg Path Length 15.2

# of MILSes 1009

Avg length of MILSes2.3 virtual links

(3.9 physical links)

Avg diagnosis granularity2.3 virtual links

(3.8 physical links)

Loss rate

[0, 0.05)

lossy path [0.05, 1.0] (15.8%)

[0.05, 0.1)

[0.1, 0.3) [0.3, 0.5)

[0.5, 1.0)

1.0

% 84.2 17.2 15.6 24.9 15.8 26.5

23

Distribution of Length of MILSes

• Most MILSes are pretty short• Some MILSes are longer than 10 hops

– Some paths do not overlap with any other paths

Most MILSes are short

A few MILSes are very long

24

Other Results

• MILS to AS Mapping– 33.6% lossy MILSes comprise only one phy

sical link • 81.8% of them connect two ASes

• Accuracy– Cross validation (99.0%)– IP spoof based consistency check (93.5%)

• Speed– 4.2 seconds for MILS computations– 109.3 seconds for setup of scalable active

monitoring [SIGCOMM04]

25

Conclusion

• Link-level property inference in directed graphs is completely different from that in undirected graphs

• With the least biased assumptions, LEND uses good path algorithm to infer link level loss rates, achieving– Good inference accuracy– Acceptable diagnosis granularity in practice– Online monitoring and diagnosis

• Continuous monitoring and diagnosis services on PlanetLab under construction

26

Thank You!

For more info:

http://list.cs.northwestern.edu/lend/

Questions?

27

28

Motivation

• End-to-End Network Diagnosis

• Under-constrained Linear System– Unidentifiable Links exist

To simplify presentation, assume

undirected graph model

A R B

29

Linear Algebraic Model (2)

! system dconstraine-underan Usually

)( sGrankk

…=

11 vectorrate losspath vectorrate losslink

matrix path where

,

}1|0{,

rs

sr

bx

GbGx

30

Identifiable and Unidentifiable

• Vectors That Are Linear Combinations of Row Vectors of G Are Identifiable

• Otherwise, Unidentifiable

111

011G

(1) 121 bxx (2) 2321 bxxx

(1)-(2) 123 bbx

A

D

C

B

1

2

3p1

p2

(1,1,0)

Row(path) space(identifiable)

x1

x2

(1,1,1)

(0,0,1)

x3

31

Examples of MILSes in Undirected Graph

1 2

1

2 3

1’

Real links (solid) and all of the overlay paths (dotted) traversing them

1’ 2’

1

231’

2’

Rank(G)=1

Rank(G)=3

Rank(G)=4

3’4’

a

4

11G

1

1

0

1

0

1

0

1

1

Ga

b c3’

5

11000

01101

10110

00011

G

MILSes

a

b

c

de

3’+2’-1’-4’ → link 3

32

Identify MILSes in Undirected Graphs

• Preparation• Identify MILSes

– Compute Q as the orthonormal basis of R(GT) (saved by preparation step)

– For a vector v in R(GT) , ||v|| = ||QTv||

x1

x2

x3

v11

~v

||~|||||| 22 vv v2

33

Flowchart of LEND System

• Step 1– Monitors O(n·logn) paths that can fully describe all the O(n

2) paths (SIGCOMM04)– Or passive monitoring

• Step 2 – Apply good path algorithm before identifying MILSes as in

undirected graph

Measure topology to get G

Active or passive monitoring

Iteratively check all possible MILSes

Compute loss rates of MILSes

Good pathalgorithm on G

Stage 2: online update the measurements and diagnosisStage 1: set up scalablemonitoring system for diagnosis

34

Evaluation with Simulation

• Metrics– Diagnosis granularity

• Average length of all the lossy MILSes in lossy path (in the unit of link or virtual link)

– Accuracy• Absolute error |p – p’ |: • Relative error

)',max()('),,max()(where

)(

)(',

)('

)(max)',(

pppp

p

p

p

pppF

35

Simulation Methodology

• Topology type– Three types of BRITE router-level topologie

s– Mecator topology

• Topology size– 1000 ~ 20000 or 284k nodes

• Number of end hosts on the overlay network– 50 ~ 300

• Link loss rate distribution– LLRD1 and LLRD2 models

• Loss model– Bernoulli and Gilbert

36

Sample of Simulation Results

# of endhost on OL

# ofpaths

AvgPL

# oflinks

# ofLP

# of linksin LP

Avg MILSlength

Avg diagnosisgranularity

50 2450 8.86 3798 1042 903 2.23(3.03) 2.24(3.07)

100 9900 8.80 9802 3551 1993 1.71(2.27) 2.05(2.95)

200 39800 8.80 22352 14706 4335 1.49(1.92) 1.77(2.38)

• Mercator (284k nodes) with Gilbert loss model and LLRD1 loss distribution

37

Related Works

• Pure End-to-End Approaches– Internet Tomography

• Multicast or unicast with loss correlation

– Uncorrelated end-to-end schemes

• Router Response Based Approach– Tulip and Cing

0

A

B C

N0.1

A

B C

N

0.19

0

0.1

0.1

0.19 0.19 0.19 0.19

(a) (b)

38

MILS to AS Mapping

• IP-to-AS mapping constructed from BGP routing tables

• Consider the short MILSes with length 1 or 2– Consist of about 44% of all lossy MILSes.– Most lossy links are connecting two dierent

ASes

1 AS 2 ASes 3 ASes >3 ASes

Len 1 MILSes (33.6%) 6.1% 27.5% 0 0

Len 2 MILSes (9.8%) 2.6% 5.8% 1.3% 0

Len > 2 MILSes (56.6%) 6.8% 17.8% 21.8% 10.2%

39

Accuracy Validation

• Cross Validation (99.0% consistent)• IP Spoof based Consistency Checking.

• UDP: Src: A, Dst: C, TTL=255

A

C

B

• UDP: Src: A, Dst: B, TTL=255• UDP: Src: C, Dst: B, TTL=2• ICMP: Src: R3, Dst: C, TTL=255

R1

R2R3

IP Spoof based Consistency: 93.5%

top related