data streaming algorithms for accurate and efficient measurement of traffic and flow matrices qi...
TRANSCRIPT
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices
Qi Zhao*, Abhishek Kumar*, Jia Wang+ and Jun (Jim) Xu*
*College of Computing, Georgia Tech+AT&T Labs - Research
Flow matrix FM FM [i, j, f] = the size of th
e flow f flowing from node i to node j
Useful in Computing usage patter
n of ISPs Detecting of flapping ro
utes Detecting DDoS attacks
Traffic and flow matrices
Traffic matrix TM TM [i, j] = traffic volume
from node i to node j Useful in
Capacity planning and forecasting
Routing configuration Network fault/reliability
diagnoses Provisioning for SLA
[ , , ] [ , ]f
FM i j f TM i j
Existing approaches
Traffic matrix Indirect inference (ho
listic) Link counts from SN
MP Routing matrix Network model
Direct measurement Sampling Our approach
Flow matrix Not well studied yet Straightforward appr
oach: sampling
Data streaming algorithms
Data streaming: processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream.
Our context Packet arrival rate is high (e.g., 10-40 Gbps) Small but fast memory — SRAM (10ns per access) w
ill be used. Challenge: how to fully use SRAM to remember
as much information pertinent to traffic/flow matrix as possible?
Two data streaming schemes
The bitmap-based scheme Traffic matrix
The counter array-based scheme Flow matrix Traffic matrix
Online streaming module
The data digest data-structure is a bit array (bitmap) initially set to all 0’s.
It is updated upon each packet arrival. Measurement proceeds in epochs.
Example
packet
0 1 2 i
0
Invariant packet header + the first 8 bytes of the payload
[Snoeren et al. SIGCOMM’01] shows that these 28 bytes
are sufficient to differentiate almost all non-identical packets.
H(.)
U := U-1
If U/b < Threshold
save the bitmap
start a new epoch
b-1
1
Complexities
Computational complexity One hash function computation One write to the memory
Storage complexity Each packet only produces a little more than
one bit as its digest. This can be further reduced using sampling.
Data analysis module
What we have so far? (for TM [i, j]): BMi generated by the traffic at node i (Ti) and
BMj generated by the traffic at node j (Tj)
What we want to estimate
[ , ] | | | | | | | |i j i j i jTM i j T T T T T T
Estimation based on BMi and BMj
[Whang et al. 1990] proposed a method to infer |T| from BM , i.e.,
where is the number of “0”s in BM. |Ti U Tj| can be inferred from the bitwise-OR of
BMi and BMj.
An estimator of TM [i, j] is given by
We derive the variance of the estimator
U
ln( / )T b b U
[ , ] i j i jTM i j T T T T | |/ | | | |/| |/(2 | | / 1)i j i j jiT T b T T T bT b
i jb e e e e T T b
Eliminating the effects of clock offset and packets in transit
1
1
2 3 4
2 3
t
Node i
Node j
T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network)If t < T1, then overlap(1,2) = 1
Combining with packets in transitT2 : a tight upper bound of packet traversal timeIf t < T1+T2, then overlap(1,2) = 1
Online streaming module
The data digest data-structure is a counter array.
It is updated upon each packet arrival. Measurement proceeds in epochs.
Data analysis module
Principle: find good counter-value matching between ingress nodes and egress nodes
Challenge: the hashing collisions make the one-to-one matching fail.
Method: iterative elephant-first matching Accuracy: work well for the medium-to-large
flow matrix elements due to the Zipfian nature of Internet traffic.
Elephant-first matching
K a1
Node i
a2
Node j
a1>a2a1-a2
Node i
0
Node j
FM[i, j, f] = a2
K a1 a2a1<=a2
0 a2-a1 FM[i, j, f] = a1
Evaluation
Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time.
We construct the synthetic experiments based on 16 publicly available packet-level traces from NLANR.
Evaluation: traffic matrix
100000
1e+06
100000 1e+06
est
ima
ted
tra
ffic
ma
trix
ele
me
nts
Original traffic matrix elements
100000
1e+06
100000 1e+06
est
ima
ted
tra
ffic
ma
trix
ele
me
nts
Original traffic matrix elements
bitmap scheme counter array scheme
Metric
2
1
ˆ1
refers to the number of matrix
elements greater than .
i
Ni i
iT ix T
T
x xRMSRE
N x
N
T
RMSRE: traffic matrix
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 20 40 60 80 100
RM
SR
E f
or
top
% o
f tr
aff
ic
Percentage of traffic above T
Bitmap schemeCounter array scheme
RMSRE: flow matrix
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
10 20 30 40 50 60 70 80 90 100
RM
SR
E
Threshold (packets)
Conclusion
A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches.
Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix.
Both algorithms are designed to operate at very high speed networks.