reverse hashing for sketch based change detection in high speed networks
DESCRIPTION
Reverse Hashing for Sketch Based Change Detection in High Speed Networks. Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen Class Presentation, June 2004, Network Security Computer Science Department, Northwestern University. Overview. Anomaly Detection - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/1.jpg)
Reverse Hashing for Sketch Based Reverse Hashing for Sketch Based
Change Detection in High Speed Change Detection in High Speed NetworksNetworks
Ashish GuptaAshish GuptaElliot ParsonsElliot Parsons
with Robert Schweller, Theory Groupwith Robert Schweller, Theory Group
Advisor: Yan ChenAdvisor: Yan ChenClass Presentation, June 2004, Network SecurityClass Presentation, June 2004, Network Security
Computer Science Department, Northwestern UniversityComputer Science Department, Northwestern University
![Page 2: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/2.jpg)
3 3
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 3: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/3.jpg)
4 4
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 4: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/4.jpg)
5 5
Anomaly Detection
• Goes beyond signature detection
• Two popular types:— Heavy Hitter Detection
— Change detection : very broad simple change to statistical methods
• Online real-time difficult— Heavy hitter: some solutions proposed
— Heavy Change ?
• Scalability with High speed traffic— Large Number of flows: large memory required
— Performance penalty
• Scalable Change Detection: Sketch to the rescue !
![Page 5: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/5.jpg)
6 6
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 6: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/6.jpg)
7 7
What is a sketch ?
• Probabilistic summary of data streams—Widely used in database research to handle
massive data streams
Space Accuracy
Hash table Per-key state 100%
Sketch CompactWith probabilistic guarantees (better for larger values)
• Array of hash tables: Tj[K] (j = 1, …, H)
![Page 7: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/7.jpg)
8 8
What is a sketch ?
1
j
H
0 1 K-1…
……
hj(k)
hH(k)
h1(k)
Update (k, u): Tj [ hj(k)] += u (for all j)
Estimate v(S, k): sum of updates for key k
K
KsumkhT jjj /11
/)]([median
![Page 8: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/8.jpg)
9 9
Using Sketch for anomaly detection
• Requires very little space:— E.g. 5 hash tables with 16 K buckets = 360 K
— High speed memory usable
— Still able to reconstruct the values with high accuracy
• Its main problem— To know the value of a key, must know the key.
— Can know the anomalies, not the keys !
![Page 9: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/9.jpg)
10 10
Using Sketch for anomaly detection
• Requires very little space:— E.g. 5 hash tables with 16 K buckets = 360 K
— High speed memory usable
— Still able to reconstruct the values with high accuracy
• Its main problem— To know the value of a key, must know the key.
— Can know the anomalies, not the keys !
??
![Page 10: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/10.jpg)
11 11
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 11: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/11.jpg)
12 12
Our contribution
??
How can we figure out the keys without storing them explicitly ?
![Page 12: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/12.jpg)
13 13
Step 1: Taking Intersections
• Each hash table independent hash function
• Each key maps to different bucket in each table— Each bucket maps to a large set of keys
• Example: Key maps to b1, b2, b3, b4, b5
• Intersect A1, A2, A3, A4, A5 really small set !
• E[x] << 1 for 5 hash tables (ref. our paper )
![Page 13: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/13.jpg)
14 14
The problem with simple intersection
• Why is this difficult ?— One to many mapping
• Each set Ai can be very large !
— E.g. for IP addresses Key space is 232. For 212 buckets 220 keys per bucket !
![Page 14: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/14.jpg)
15 15
Problem with Intersections
• How do we store these huge mappings ?
• How de we take intersections of these huge sets ?
Modular hashing
• Partition the key into separate words
• Hash each word separately
32 bits
8 bits
10010100 10101011 10010101 10100011
![Page 15: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/15.jpg)
16 16
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
![Page 16: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/16.jpg)
17 17
Modular hashing
Only 32 elements per partition
• For 8 bit to 3 bit hashing : Each bucket maps to 25 = 32 keys small !
28/23
![Page 17: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/17.jpg)
18 18
Modular Hashing is Efficient
• Very efficient in space and time:
— If n is the key space, m is hash space, q is number of words,
— Space =
— Run time (intersections) =
))((1
q
m
nqO
))(.(1
q
m
nHqO
Set q = O(log n)
logarithmic in key space
poly-log in key space
![Page 18: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/18.jpg)
19 19
An Important problem: spatial locality
• This hashing scheme is not uniform and biased
• In network streams, strong spatial locality in IP addresses
• E.g. many addresses fall into 120.105.56.*
• These would be mapped into very few buckets large number of collisions low sketch accuracy
IP Mangling
![Page 19: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/19.jpg)
20 20
Without IP mangling: skewed !
![Page 20: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/20.jpg)
21 21
IP Mangling removes correlations
• Key idea : randomize the input data to destroy correlations
• Must be reversible also !
![Page 21: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/21.jpg)
22 22
Theory of Modular Linear Equations
f(x) a·x mod n
To be invertible: Must be relatively prime
• a is chosen randomly
• Can be easily reversed: replace a by a-1 !
• This function is highly effective in resolving the skewed distribution
![Page 22: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/22.jpg)
23 23
With IP mangling: uniform !
![Page 23: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/23.jpg)
24 24
Recap
Intersections of reverse mapped sets Converges to culprit key
Modular Hashing Makes intersection time and space efficient
IP Mangling Removes un-uniformity of modular hashing
![Page 24: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/24.jpg)
25 25
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 25: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/25.jpg)
26 26
Handling Multiple Intersections…
• A more complex problem Illustration
How do we take intersections now ?
• Each hash table contains two anomalies now two culprit keys…
![Page 26: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/26.jpg)
27 27
Handling Multiple Intersections…
• Multiple possibilities….
• Take union of keys from each hash table, and then intersection False positives
![Page 27: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/27.jpg)
28 28
Handling Multiple Intersections…
• Multiple possibilities….
• Try all possible combinations of intersections….
• Expensive and inaccurate(?)
![Page 28: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/28.jpg)
29 29
Handling Multiple Intersections…
• Bucket Vector Algorithm: a new algo— Efficient
— Similar to all possible intersections but takes polynomial time
• Documented in our technical report
![Page 29: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/29.jpg)
30 30
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 30: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/30.jpg)
31 31
Evaluation
• Got traffic traces from a large ISP— Each 5 min interval 7.5 GB of traces
• Used the Change Detection Method described earlier
![Page 31: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/31.jpg)
32 32
Evaluation
• Efficacy depends on number of heavy changers— Depends on change threshold,
— Less threshold large number of heavy changes
• To verify our results, used a naïve multi-pass algo the Ground Truth
![Page 32: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/32.jpg)
33 33
Our methods are quite effective
• Detection quite accurate, even upto 20 heavy changes
• False positives and false negatives very less
![Page 33: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/33.jpg)
34 34
The bucket vector algorithm is important
• For multiple changes, the method of intersection quite important
• E.g. w/o bucket vector algorithm:
![Page 34: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/34.jpg)
35 35
We can make the sketch more accurate
• Use 6 hash tables , instead of 5— Makes intersections very accurate, less false negatives
![Page 35: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/35.jpg)
36 36
Conclusions
• Sketches a powerful method for scalable change detection
• Our main contribution : can reverse them— Greatly enhances their applicability in online systems
• We can extract heavy changes from the sketchs, without storing any key information
• Methods are accurate— Low number of false positives and false negatives
• Methods are efficient— Runtime: Only poly-logarithmic in key space
— Space: logarithmic in key space
![Page 36: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/36.jpg)
37 37
Overview
• Anomaly Detection
• Sketch Based Approaches and their problems
• Reverse Hashing algorithms
• Dealing with Multiple Anomalies
• Evaluation
• Conclusions
• Future Work
![Page 37: Reverse Hashing for Sketch Based Change Detection in High Speed Networks](https://reader030.vdocument.in/reader030/viewer/2022032606/56812e40550346895d93c0fc/html5/thumbnails/37.jpg)
38 38
Future Work: Three areas
• Application to Online real-time systems— Performance evaluation
— Hardware design of our methods
• More advanced applications:— Hierarchical change detection
— Output the prefix changes not just the key changes !– E.g. 129.105.100.* shows a big change !
• Advanced change detection methods:— Statistical methods