interesting links
DESCRIPTION
Interesting Links. On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU. Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks. Overview. What is Self Similarity? - PowerPoint PPT PresentationTRANSCRIPT
1
Interesting Links
On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCOREMurad S. Taqqu BU
Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks
3
Overview
What is Self Similarity?
Ethernet Traffic is Self-Similar
Source of Self Similarity
Implications of Self Similarity
Section 1:
What is Self-Similarity ?
5
Intuition of Self-Similarity
Something “feels the same” regardless of scale (also called fractals)
6
7
8
9
What is Self-Similarity?
In case of stochastic objects like time-series, self-similarity is used in the distributional sense
10
Pictorial View of Self-Similarity
11
The Famous Data
Leland and Wilson collected hundreds of millions of Ethernet packets without loss and with recorded time-stamps accurate to within 100µs.
Data collected from several Ethernet LAN’s at the Bellcore Morristown Research and Engineering Center at different times over the course of approximately 4 years.
12
13
Why is Self-Similarity Important? Recently, network packet traffic has been
identified as being self-similar. Current network traffic modeling using
Poisson distributing (etc.) does not take into account the self-similar nature of traffic.
This leads to inaccurate modeling which, when applied to a huge network like the Internet, can lead to huge financial losses.
14
Problems with Current Models A Poisson process
When observed on a fine time scale will appear bursty
When aggregated on a coarse time scale will flatten (smooth) to white noise
A Self-Similar (fractal) process When aggregated over wide range of time scales
will maintain its bursty characteristic
15
Consequences of Self-Similarity Traffic has similar statistical properties at a
range of timescales: ms, secs, mins, hrs, days
Merging of traffic (as in a statistical multiplexer) does not result in smoothing of traffic
Bursty DataStreams
Aggregation Bursty AggregateStreams
16
Pictorial View of Current Modeling
17
Side-by-side View
18
Definitions and Properties
Long-range Dependence autocorrelation decays slowly
Hurst Parameter Developed by Harold Hurst (1965) H is a measure of “burstiness”
also considered a measure of self-similarity 0 < H < 1 H increases as traffic increases
19
Definitions and Properties Cont.’d
low, medium, and high traffic hours as traffic increases, the Hurst parameter increases
i.e., traffic becomes more self-similar
21
Properties of Self Similarity X = (Xt : t = 0, 1, 2, ….) is covariance stationary random
process (i.e. Cov(Xt,Xt+k) does not depend on t for all k)
Let X(m)={Xk(m)} denote the new process obtained by averaging
the original series X in non-overlapping sub-blocks of size m.
Mean , variance 2
Suppose that Autocorrelation Function r(k) k-β, 0<β<1
E.g. X(1)= 4,12,34,2,-6,18,21,35
Then X(2)=8,18,6,28
X(4)=13,17
22
Auto-correlation Definition X is exactly second-order self-similar if
The aggregated processes have the same autocorrelation structure as X. i.e.
r (m) (k) = r(k), k0 for all m =1,2, …
X is [asymptotically] second-order self-similar ifthe above holds when [ r (m) (k) r(k), m
Most striking feature of self-similarity: Correlation structures of the aggregated process do not degenerate as m
23
Traditional Models
This is in contrast to traditional models Correlation structures of their aggregated
processes degenerate as m i.e. r (m) (k) 0 as mfor k = 1,2,3,...
Example: Poisson Distribution Self-Similar Distribution
24
25
Long Range Dependence
Processes with Long Range Dependence are characterized by an autocorrelation function that decays hyperbolically as k increases
Important Property: This is also called non-summability of correlation
kkr )(
26
Intuition
Short-range processes: Exponential Decay of autocorrelations , i.e.: r(k) ~ pk , as k , 0 < p < 1 Summation is finite
The intuition behind long-range dependence: While high-lag correlations are all individually
small, their cumulative affect is important Gives rise to features drastically different from
conventional short-range dependent processes
27
The Measure of Self-Similarity Hurst Parameter H , 0.5 < H < 1 Three approaches to estimate H (Based on
properties of self-similar processes) Variance Analysis of aggregated processes Analysis of Rescaled Range (R/S) statistic for
different block sizes A Whittle Estimator
28
Variance Analysis
Variance of aggregated processes decays as: Var(X(m)) = am-b as m inf,
For short range dependent processes (e.g. Poisson Process), Var(X(m)) = am-1 as m inf,
Plot Var(X(m)) against m on a log-log plot Slope > -1 indicative of self-similarity
29
30
The R/S statistic
)],......,,0min(),......,,0[max()(
1)()(
2121 nn WWWWWWnSnS
nR
)(),(
),,....2,1:(2 nSVarianceSamplenXmeanSample
nkX k
)()....( 21 nXkXXXW kk
where
For a given set of observations,
Rescaled Adjusted Range or R/S statistic is given by
31
Example
Xk = 14,1,3,5,10,3 Mean = 36/6 = 6 W1 =14-(1.6 )=8 W2 =15-(2.6 )=3 W3 =18-(3.6 )=0 W4 =23-(4.6 )=-1 W5 =33-(5.6 )=3 W6 =36-(6.6 )=0
R/S = 1/S*[8-(-1)] = 9/S
32
The Hurst Effect
For self-similar data, rescaled range or R/S statistic grows according to cnH H = Hurst Paramater, > 0.5
For short-range processes , R/S statistic ~ dn0.5
History: The Nile river In the 1940-50’s, Harold Edwin Hurst studies the 800-year record of
flooding along the Nile river. (yearly minimum water level) Finds long-range dependence.
33
34
Whittle Estimator
Provides a confidence interval Property: Any long range dependent process
approaches FGN, when aggregated to a certain level
Test the aggregated observations to ensure that it has converged to the normal distribution
35
Recap
Self-similarity manifests itself in several equivalent fashions: Non-degenerate autocorrelations Slowly decaying variance Long range dependence Hurst effect
Section 2:
Ethernet Traffic is Self-Similar
37
Plots Showing Self-Similarity (Ⅰ)
H=0.5
H=0.5
H=1
Estimate H 0.8
38
Plots Showing Self-Similarity (Ⅱ)
Higher Traffic, Higher H
High Traffic
Mid Traffic
Low Traffic
1.3%-10.4%
3.4%-18.4%
5.0%-30.7%
39
Observation shows “contrary to Poisson”
Network Utilization H As we shall see shortly, H measures traffic burstiness
As number of Ethernet users increases, the resultingaggregate traffic becomes burstier instead of smoother
H : A Function of Network Utilization
40
Difference in low traffic H values Pre-1990: host-to-host workgroup traffic Post-1990: Router-to-router traffic Low period router-to-router traffic consists
mostly of machine-generated packets Tend to form a smoother arrival stream, than low
period host-to-host traffic
41
H : Measuring “Burstiness”
Intuitive explanation using M/G/Model As α 1, service time is more variable, easier
to generate burst Increasing H !
42
Summary
Ethernet LAN traffic is statistically self-similar
H : the degree of self-similarity H : a function of utilization H : a measure of “burstiness”
Models like Poisson are not able to capture self-similarity
43
Discussions
How to explain self-similarity ? Heavy tailed file sizes
How this would impact existing performance? Limited effectiveness of buffering Effectiveness of FEC
How to adapt to self-similarity? Prediction Adaptive FEC
Section 3:
Explaining Self - Similarity
45
Introduction
46
Introduction
The superposition of many ON/OFF sources whose ON-periods and OFF-periods exhibit the Noah Effect produces aggregate network traffic that features the Joseph Effect. Noah Effect: high variability or infinite variance Joseph Effect: self-similar or long-range
dependent
Also known as packet train models
47
The Noah Effect
Noah Effect is the essential point of departure from traditional to self-similar traffic modeling
Results in highly variable ON-OFF periods : Train length and inter-train distances can be very large with non-negligible probabilities
Infinite Variance Syndrome : Many naturally occurring phenomenon can be well described with infinite variance distributions
Heavy-tail distributions, parameter
48
Existing Models
Traditional traffic models: finite variance ON/OFF source models
Superposition of such sourcesbehaves like white noise, with only short range correlations
49
Idealized ON/OFF Model Lengths of ON- and OFF periods are iid positive random
variables, Uk
Suppose that U has a hyperbolic tail distribution,
Property (1) is the infinite variance syndrome or the Noah Effect. 2 implies E(U2) = > 1 ensures that E(U) < , and that S0 is not infinite
(1) ,21 , as ~)( ucuuUP
50
http://statistik.wu-wien.ac.at/cgi-bin/anuran.pl
51
52
Explaining Self-Similarity
Consider a set of processes which are either ON or OFF The distribution of ON and OFF times are heavy
tailed 12 The aggregation of these processes leads to a
self-similar process H = (3 - min 12)/2
So, how do we get heavy tailed ON or OFF times?
53
Heavy Tailed ON Times and File Sizes
Analysis of client logs showed that ON times were, in fact, heavy tailed ~ 1.2 Over about 3 orders of magnitude
This lead to the analysis of underlying file sizes ~ 1.1 Over about 4 orders of magnitude Similar to FTP traffic
Files available from UNIX file systems are typically heavy tailed
54
Heavy Tailed OFF times
Analysis of OFF times showed that they are also heavy tailed ~ 1.5
55
Ethernet LAN Traffic Measurements at the Source Level
Location Bellcore Morristown Research and Engineering Center
The first set The busy hour of the August 1989 Ethernet LAN
measurements About 105 sources, 748 active source-destination pairs 95% of the traffic was internal
The second set 9 day-long measurement period in December 1994 About 3,500 sources, 10,000 active pairs Measurements are made up entirely of remote traffic
56
Textured Plots of Packet Arrival Times
57
Textured Plots of Packet Arrival Times
58
Checking for the Noah Effect
Complementary distribution plots
Hill’s estimate Let U1, U2,…, Un denote the observed ON-(or
OFF-)periods and write U(1) U(2) …U(n) for the corresponding order statistics
uucuUP as ),log()log(~))(log(
(3) ,)log(log1ˆ11
0)()1(
ki
iknnn UU
k
59
60
61
Traffic Modeling and Generation Although network traffic is intrinsically
complex, parsimonious modeling is still possible. Estimating a single parameter (intensity of the
Noah Effect) is enough.
62
Performance and Protocol Analysis
The queue length distribution Traditional (Markovian) traffic: decreases
exponentially fast Self-similar traffic: decreases much more slowly
Protocol design should be expected to take into account knowledge about network traffic such as the presence or absence of the Noah Effect.
63
Conclusion
The presence of the Noah Effect in measured Ethernet LAN traffic is confirmed.
The superposition of many ON/OFF models with Noah Effect results in aggregate packet streams that are consistent with measured network traffic, and exhibits the self-similar or fractal properties.
64
Major Results of CB97
Established that WWW traffic was self-similar Modeled a number of different WWW
characteristics (focus on the tail) Provide an explanation for self-similarity of
WWW traffic based on underlying file size distribution
65
An example File size Distribution on a Win2000 machine
Section 4:
Impact of Self Similarity
67
Comparison
68
Impact on Network Engineering Queuing delays are much higher in the
presence of long range dependence than for Poisson traffic
To avoid dropping packets, buffers have to be huge
You have to be very careful predicting future traffic based past measurement
You cannot look at a little bit of video and decide how much buffer it’s going to require
Thanks !