reversible sketches for efficient and accurate change detection over network data streams

27
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University

Upload: kenyon-bray

Post on 03-Jan-2016

15 views

Category:

Documents


2 download

DESCRIPTION

Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University. Online Change Detection. Network anomalies are common - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

1

Reversible Sketches for Efficient and Accurate Change Detection over

Network Data Streams

Robert Schweller Ashish GuptaElliot ParsonsYan Chen

Computer Science Department, Northwestern University

Page 2: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

2

Online Change Detection• Network anomalies are common

– Flash crowds, failures, DoS, worms, …

Online Detection over Data Streams

• Data Stream: key/update pairs (k,u)

–Heavy hitters (lots of prior work)

–Heavy changes

Page 3: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

3

-first to detect flow-level heavy changes in massive data streams at network traffic speeds.

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

Page 4: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

4

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)

Update (k, u): Tj [ hj(k)] += u (for all j)

Estimate v(S, k): sum of updates for key k

K

KsumkhT jjj /11

/)]([median

Page 5: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

5

??

Page 6: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

6

??

• Main problem– Cannot efficiently report keys with heavy change

• Our Contribution– Determine set of keys that have “large” estimates in sketch

• Requires very little space:–E.g. 5 hash tables with 16 K buckets = 80 KB–Fits in high speed memory

Page 7: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

7

1

2

3

5

4

“Heavy”

Input:

Output: Set of keys that hash to heavy buckets in majority (or all) hash tables

-Sketch-Threshold

Reverse Sketch Problem

Page 8: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

8

Outline

Streamingdatarecording

k-ary sketch

value

key

Heavychangedetection

k-ary sketch

heavychangekeys

changethreshold

fast

slow

Modularhashing

IP mangling

ReverseHashing

Algorithms

Improve Heavy Change Detection

Page 9: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

9

• Intersect A1, A2, A3, A4, A5

Taking Intersections

H = 5 K = 212 #keys = 232 (IP addresses)

E[false positives] << 1

Page 10: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

10

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

H = 5 K = 212 #keys = 232 (IP addresses)

|A1| = 232 / 212 = 220

Page 11: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

11

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

• Solution:

Modular hashing

Page 12: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

12

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

010 110 001 101

h()

12 bits

Page 13: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

13

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

Page 14: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

14

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

28/23 = 25

Page 15: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

15

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 16: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

16

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 17: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

17

1

2

3

5

4

b1

b2

b4

b5

b3b3

b1

b2

b4

b5

Handling Multiple Intersections…

2H different intersections

Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

Page 18: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

18

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

Page 19: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

19

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

IP Mangling

Solution:

Page 20: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

20

IP-mangling

Page 21: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

21

Invertible Modular Linear Equation

f(x) a·x mod n

To be invertible: Must be relatively prime

• a is odd, chosen randomly

Page 22: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

22

Modular Hashing

Optimal Hashing

Page 23: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

23

Modular Hashing

Modular Hashing with IP Mangling Optimal Hashing

Page 24: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

24

Recap:

Streamingdatarecording

reversiblek-ary

sketch

value storedvalue

Modularhashing

IP manglingkey

Heavychangedetection

reversiblek-ary

sketch

Reversehashing

ReverseIP mangling

heavychangekeys

changethreshold

)( loglog/1 nn

)loglog

log(

n

n

Page 25: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

25

Evaluation• Traffic traces from Northwestern University edge router

– Each 5 min interval average traffic 7.5 GB in each interval

• Compared with Ground Truth• 6 hash tables, 4K buckets each, totally 192KB memory• Up to 140 true heavy change keys in 1.5 seconds

– Over 95% TPP– Less than 2% FPP

• All missing changes are due to boundary effects

Page 26: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

26

Conclusions/ Future Work

• Sketches: efficient summary structures • Our contribution: Reversible Sketches

– efficient online detection of keys with heavy changes

Work in Progress (see tech report)

• Improved reverse hashing• Statistical guarantee on detection accuracy• More advanced applications:

– Hierarchical change detection• E.g. 129.105.100.* shows a big change !

Page 27: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

27

See tech report for more!

http://list.cs.northwestern.edu

Thank you !