author: rodrigo fonseca, george porter, randy h. katz, scott shenker, ion stoica presenter :yinzhi...

Post on 21-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Author: Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica

Presenter :Yinzhi Cao

Outline Background Origin X-Trace

VectorFlowing VectorGodOverHead

Usage Scenarios Potential Problems

Background(1)

Network Diagnosis Scenarios One (Accessing Website)

Background(2)

Scenario Two (Distributed File System)

Background(3) Existing Method

White Box

X-Trace

Black Box

Wap5

Sherlock Comparison of

White Box and Black Box

WhiteBox

BlackBox

Overhead Large Small

Modification to Program Yes No

Notification of Program No Yes

Accuracy High Low

Origin of X-Trace How to Diagnosis a

Person?1. Radioactive MaterialImplies: We need a vector

flowing in our body.2. X-Ray DetectorImplies: We need a collector

to monitor activities. 3. OverheadImplies: There is no free

lunch.

X-Trace(Vector)

Vector: X-Trace Metadata

X-Trace(Flowing Vector)

Flowing Vector

Only Vectors are of no use. We make it flow and we get the info. The following is an entity we want to diagnosis.

X-Trace(Flowing Vector) Continued Let Vectors Flow.

Two Ways: pushNext() and pushDown()

X-Trace(Collector)

Like diagnosing a person, we need a god to collect all the data and reconstruct offline trees.

The question is how to?

X-Trace(Overhead)

Modification of Existing Program

X-Trace(Overhead) Continued Influence on Current Network Flow

1. Metadata is very small which brings little additional flow to the network.

2. Reports are sent in different channels which doesn’t occupy current network flow

Usage Scenarios of X-Trace(1) Web Request and Recursive DNS

queries

Usage Scenarios of X-Trace(2) A Web Hosting Site

Usage Scenarios of X-Trace(3) An Overlay Network

Potential Problems Mentioned by Author Report Loss Managing Report Traffic Non-Tree Request Structures Partial Deployment Security Consideration

We have examined White Box. So let’s come to some other approach, which may not be that accurate but may cost less overhead.First, we need some models.

Author: Victor Bahl, Ranveer Chandra, Albert Greenberg, Srikanth Kandula, David A. Maltz, Ming

Zhang

Presenter: Yinzhi Cao

Outline

ModelsNode ModelNetwork ModelRelationship Model

How to use Our Model Algorithm Efficiency Evaluation

Models

The main idea of this paper is to establish a model of network and use this model to diagnose.

We have three levels of Model: Node, Network and Relationship.

Node Model

Node has three status: down, up and troubled.

Network Model

Graph What’s

more? Inference Graph.

Relationship Model(1)

Noisy-Max

Backup Slides 1 First, we use the model below. The

circle means with x probability the output is the input, and with 1-x probability the output is up.

Let’s use unordered pair {x,y} to represent node status.{1,1} = {1} up{0,1} troubled{0,0} = {0} down

Backup Slides 2

So the status of Child can be represented as follows.

Status(Child) = |Status(Parent)•Status(Parent)|

• means outer product.

And we define |(x,y)| = <x,y> = xy.

Relationship Model(2)

Selector

Relationship Model(3)

Failover

Backup Slides 3

We use definition before. Status(Parent1)={x1,x2},

Status(Parent2)={y1,y2}. Status(Child)={(x1+x2)x1+not(x1+x2)y1,

(x1+x2)x2+not(x1+x2)y2}

+ means and, * means or which is skipped.

How to Use Model?

Fault Localization on the Inference Graph

Algorithm Efficiency(1)

Calculations inside Inference Graph ( noisy max relationship )

Reduce time complexity from O(3n) to O(n)

Algorithm Efficiency(2)

Comparison of Multiple Input and Observation

Two Methods to Use

1. Examine Data Sets with High Probability and Ignore Small Ones

2. Dynamic Programming (Reduce Redundancy)

Algorithm Efficiency(3) Author conclude two observations using

these two methods.1. It is very likely that at any point in time only

a few root-cause nodes are troubled or down.

2. Since a root-cause is assigned to be up in most assignment vectors, the evaluation of an assignment vector only requires re-evaluation of states at the descendants of rootcause nodes that are not up.

Evaluation

Inference Graph Established

Accuracy Compared with others

Time to Localize Faults

Impact of Errors in Inference Graph

Open Issues

The Node Model is very simple, which only has three status. Can we have a continuous model of it?

Can we take some stochastic process concept like Markov-Chain into this model?

top related