florin dinu t. s. eugene ng rice university inferring a network congestion map with traffic overhead...

38
Florin Dinu T. S. Eugene Ng Rice University Inferring a Network Congestion Map with Traffic Overhead 0 zero

Upload: cristobal-sade

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Florin Dinu T. S. Eugene NgRice University

Inferring a Network Congestion

Map with Traffic Overhead0zero

2

Effects of Congestion

Need to identify, quantify and localize congestion

3

The Vision: Passively Inferred Congestion Map

R0 R1 R3 R5

X7

X8

AS2

R2 R4 R6

. . .

. . .

Without any dedicated measurement (probing) traffic At fine time granularities (seconds) Good accuracy

AS1

How it works? Why it works? Where is this applicable?

4

Benefits of Passive InferenceSolution/Challenges Active Reporting

(SNMP)Passive

Inference

Has reasonable accuracy

Does not need access rights to routers

Does not exacerbate existing congestion

Detects congestion intimely manner

xx

Passive inference – complementary to active reporting

5

Overview – Passively Inferring Congestion Maps

R0 R1 R3 R5

X7

X8AS1

AS2

R2 R4 R6

. . .

. . .

R0 R1

Step 1 : Use congestion markings from existing traffic Get path-level congestion information Routers are AQM/ECN capable and can mark existing traffic

6

P06

P04

P46 ?

Expand on Step 1: path-level congestion from AQM/ECN markings

R0 R1 R3 R5

R2 R4 R6

R0 R1

Step 2: Use topological information to complete congestion map

P06 – P04

P46 = func(P06,P04) = 1 – P04

Overview – Passively Inferring Congestion Maps

7

AQM Background

AQM = Active Queue Management

Router marks/drops packets probabilistically as a function of congestion severity

Many different definitions of congestion severity

Mar

king

Pro

babi

lity

(MP)

Congestion severity

RED, PI

REM

We use marking probability (MP) as the congestion measure

8

ECN Background – Marking Data Packets

S D

AQM/ECN

Data packets are marked probabilistically

ECN = Explicit Congestion Notification

9

Use of the Data Markings

R0 R1 R3 R5

R2 R4 R6

R0 R1P40

P30P60

Data markings describe congestion on routers’ ingress paths

Data packet marking is probabilistic => Use ratio of marked data packets to obtain MP on the ingress path

10

ECN Background - Echoing

Echoing the markings from data packets to ACKs:

S DACK

DATA

The ACK markings are an altered version of the data packet markings

11

ECN Background – Responding to Markings

Responding to marked ACKs:

S

Stopping the echoing after receiving a CWR packet:

S D

DS

ACK

DATA

CWR

DATA

CWR

ACK

The ACK markings are an altered version of the data packet markings

12

Groups - Effect of ECN Echoing

Groups of unmarked ACKs of “size zero”:

Groups of marked and unmarked ACKs:

CWR

D

D

CWR

ACK

DATA

ACK

DATA

Group of size zero

13

Use of the ACK MarkingsR0 R1 R3 R5

R2 R4 R6

R0 R1

P05

P03P04

ACK markings describe congestion on forward paths of the flows

ACK markings describe congestion on routers’ egress paths

Ratio of marked ACKs is an inaccurate measure

ACK markings are very important and more challenging to use

14

Obtaining MP from ACK Markings

p = MP on the forward path

AVG_SZ_UNMARKED = func(p)

DACK

DATA

= ∑ n (1-p)∙ n p=(1-p)/p∙ n=0

To get MP need to compute average size of groups of unmarked ACKs

CWR

15

Average Size of Groups of Unmarked ACKsSampling Interval (SI)

Training period

start of Estimation Interval (EI)

Flow1

Flow2

Flow5

end of EI

Select flows until a limit is reached During training period only select flows, do not compute samples For each following SI

Sample = avg size of groups of unmarked ACKs that finish in that SI Discard groups that start or end in different EI

At end of EI use AVG(SAMPLES)=(1-p)/p to obtain p

Flow4

Flow3

Not selected

16

Optimization – the Use of Groups of Size Zero

Probability of a group to be of size zero is: (1-p)0 p = p∙

If p is high, most groups will be of size zero

Better statistical significance if use groups of size zero

Routers need to be on both the data and ACK path of a flow

CWR

DACK

DATA

Group of size zero

Use of groups of size zero increases accuracy

17

Evaluation – Parameter Settings

ns-2 simulations, 500s simulation time

AQM algorithms (RED, PI, REM) – RED by default

SI=0.5 (congestion sample computed every 0.5s)

Monitor at most 1000 flows per EI/path

Groups of size zero used in all experiments

18

Evaluation – Traffic & Topology

5ms link delay, 500Mbps link bandwidth

Metric: 50th, 90th percentile of |inferred MP – real MP | for each link

R0 to Ri : 250*i2 TCP flows

Ri to Ri+2: 100 TCP flows

R0 R1 R2 R8 R9 R10

Ri to Ri+2: 100 TCP flows

UDP UDP UDP UDP

Hop 10

19

Evaluation – vs Baseline Solution

DCW

RACK

DATA

Our group-based solution (GROUP):

Baseline solution, no alteration (REFERENCE):

D

CWR

ACK

DATA

GROUP vs REFERENCE

Sensitivity to the Length of the EI

20Accuracy decreases with hop count but is within 0.1 for most cases

Value of EI (s) - logscale

Sensitivity to Drastic Changes

21

UDP sources vary their sending rate by 50Mbps between 250Mbps and 750Mbps

Every 10s we start 3000 TCP flows between random nodes, for a random time (0-10s)

How well does our solution track these sudden and large variations?

Sensitivity to Drastic Changes

Accuracy decreases with hop count but is within 0.1- 0.15 for most cases22

EI = 10s

EI = 3s90th perc.

50th perc.

23

Sensitivity to AQM Marking Function

A linear marking function allows better inference for our solution

Why does REM perform much worse? Abrupt variations in marking probability Limited visibility

Mar

king

/Dro

p Pr

obab

ility

Congestion severity

RED, PI

REM

24

Limited Visibility

R0 R1 R2

R1 marks 100% of packets

R2 marks 30% of packets

P20

P10

If P20=P10=100%, P12 is unknown (any value possible)

At high MP (less than 100%) problem still exist because very few packets are left unmarked

Limited visibility appears at high MP. More probable for REM.

P12=??

25

Sensitivity to Dropped ACKs - Numerical

Drop ACKs can modify the average size of groups of unmarked ACKs

Size 4 5 1 5

Size 8 1 4

Average size: 3.75

Average size: 4.33

ACKs can be dropped by non-AQM/ECN routers Pure ACKs can be dropped even by AQM/ECN routers

26

Sensitivity to Dropped ACKs - Numerical

At reasonable drop probabilities the additional error is low

27

Other Advantages of Our Solution

Incremental deployment On specific paths Around non AQM-ECN routers

Useful in heterogeneous environments Different AQM types

28

Related Work

Re-ECN [SIGCOMM 2005] , ConEx IETF WG Extends ECN with one step Sources re-echo congestion information from ACK markings A router on forward path has upstream, downstream and whole

path-congestion Useful for traffic policing or traffic management

Lower precision. Limited by header space bits. Needs modifications to ECN and headers Does not address challenge posed by ACK markings Does not go beyond path-level congestion inference

29

Conclusion

Novel method for inferring congestion with zero network overhead

Does not require changes to hosts, headers or protocols

Incrementally deployable and useful in heterogeneous environments

Good accuracy even in very congested environments

30

Thank you

Credits for the pictures

• http://networkequipment.net/wp-content/uploads/2011/02/voip-telephone.jpg• http://www.freefoto.com/images/04/28/04_28_50---US-Dollar-Bills_web.jpg• http://www.ciscorouting.com/routing_engine.jpg• http://www.rvoice.co.uk/uploads/Image/Green%20Tick.jpg

31

Why not Use Ratio for ACK Markings?

The ratio of marked ACKs is very inaccurate. Need a better solution.

32

Effects of Using Delayed ACK - Numerical

Additional error introduced by the use of delayed ACK

33

Sensitivity to Bandwidth (EI = 3s)

Accuracy increases with bandwidth

34

Sensitivity to Flow Size (EI = 3s)

Good accuracy even with many small flows

35

Severity of False Positives (EI = 3s)

Small false positives inherent in probabilistic approach

36

R0 R1 R3 R5

R2 R4 R6

R0 R1P40

P06

Granularity of Inference

Estimation Interval (EI)

Sampling Interval (SI)

estimate(P06) = AVG( {samples(P06)} )

37

Counters per-path Length & Number of all groups of unmarked Acks

Counters per-flow Current group of unmarked ACKs

Prefix matching for source and destination Transport protocol header matching for flow

identification Sequence numbers for CWR

Implementation

38

Six real network topologies (Internet2, TEIN2, iLight, GEANT, SUNET, NLR) Assume all-to-all traffic pattern

Average congestion map coverage NLR, Internet2, GEANT ~60% TEIN2 ~ 91% iLight ~ 94% SUNET ~ 95%

Coverage of Congestion Maps