Transcript
Page 1: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

1

Reversible Sketches for Efficient and Accurate Change Detection over

Network Data Streams

Robert Schweller Ashish GuptaElliot ParsonsYan Chen

Computer Science Department, Northwestern University

Page 2: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

2

Online Change Detection• Network anomalies are common

– Flash crowds, failures, DoS, worms, …

Online Detection over Data Streams

• Data Stream: key/update pairs (k,u)

–Heavy hitters (lots of prior work)

–Heavy changes

Page 3: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

3

-first to detect flow-level heavy changes in massive data streams at network traffic speeds.

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

Page 4: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

4

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)

Update (k, u): Tj [ hj(k)] += u (for all j)

Estimate v(S, k): sum of updates for key k

K

KsumkhT jjj /11

/)]([median

Page 5: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

5

??

Page 6: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

6

??

• Main problem– Cannot efficiently report keys with heavy change

• Our Contribution– Determine set of keys that have “large” estimates in sketch

• Requires very little space:–E.g. 5 hash tables with 16 K buckets = 80 KB–Fits in high speed memory

Page 7: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

7

1

2

3

5

4

“Heavy”

Input:

Output: Set of keys that hash to heavy buckets in majority (or all) hash tables

-Sketch-Threshold

Reverse Sketch Problem

Page 8: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

8

Outline

Streamingdatarecording

k-ary sketch

value

key

Heavychangedetection

k-ary sketch

heavychangekeys

changethreshold

fast

slow

Modularhashing

IP mangling

ReverseHashing

Algorithms

Improve Heavy Change Detection

Page 9: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

9

• Intersect A1, A2, A3, A4, A5

Taking Intersections

H = 5 K = 212 #keys = 232 (IP addresses)

E[false positives] << 1

Page 10: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

10

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

H = 5 K = 212 #keys = 232 (IP addresses)

|A1| = 232 / 212 = 220

Page 11: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

11

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

• Solution:

Modular hashing

Page 12: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

12

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

010 110 001 101

h()

12 bits

Page 13: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

13

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

Page 14: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

14

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

28/23 = 25

Page 15: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

15

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 16: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

16

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 17: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

17

1

2

3

5

4

b1

b2

b4

b5

b3b3

b1

b2

b4

b5

Handling Multiple Intersections…

2H different intersections

Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

Page 18: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

18

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

Page 19: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

19

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

IP Mangling

Solution:

Page 20: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

20

IP-mangling

Page 21: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

21

Invertible Modular Linear Equation

f(x) a·x mod n

To be invertible: Must be relatively prime

• a is odd, chosen randomly

Page 22: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

22

Modular Hashing

Optimal Hashing

Page 23: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

23

Modular Hashing

Modular Hashing with IP Mangling Optimal Hashing

Page 24: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

24

Recap:

Streamingdatarecording

reversiblek-ary

sketch

value storedvalue

Modularhashing

IP manglingkey

Heavychangedetection

reversiblek-ary

sketch

Reversehashing

ReverseIP mangling

heavychangekeys

changethreshold

)( loglog/1 nn

)loglog

log(

n

n

Page 25: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

25

Evaluation• Traffic traces from Northwestern University edge router

– Each 5 min interval average traffic 7.5 GB in each interval

• Compared with Ground Truth• 6 hash tables, 4K buckets each, totally 192KB memory• Up to 140 true heavy change keys in 1.5 seconds

– Over 95% TPP– Less than 2% FPP

• All missing changes are due to boundary effects

Page 26: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

26

Conclusions/ Future Work

• Sketches: efficient summary structures • Our contribution: Reversible Sketches

– efficient online detection of keys with heavy changes

Work in Progress (see tech report)

• Improved reverse hashing• Statistical guarantee on detection accuracy• More advanced applications:

– Hierarchical change detection• E.g. 129.105.100.* shows a big change !

Page 27: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

27

See tech report for more!

http://list.cs.northwestern.edu

Thank you !


Top Related