george varghese (based on cristi estan’s work) university of california, san diego may 2011...
TRANSCRIPT
![Page 1: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/1.jpg)
George Varghese (based on Cristi Estan’s work)
University of California, San DiegoMay 2011
Internet traffic measurement:
from packets to insight
![Page 2: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/2.jpg)
Research motivation
The Internet in 1969 The Internet today
Problems Flexibility, speed, scalability
Overloads, attacks, failures
Measurement & control
Ad-hoc solutions suffice
Engineered solutions needed
Research direction: towards a theoretical foundation for systems doing engineered measurement of the Internet
![Page 3: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/3.jpg)
Current solutions
AnalysisServer
Raw dataTraffic
reports
Network OperatorRouter
Fast link
Memory
Network
State of the art: simple counters (SNMP), time series plots of traffic (MRTG), sampled packet headers (NetFlow), top k reports
Concise?Accurate?
![Page 4: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/4.jpg)
Measurement challenges
• Data reduction – performance constraints Memory (Terabytes of data each hour) Link speeds (40 Gbps links) Processing (8 ns to process a packet)
• Data analysis – unpredictability Unconstrained service model (e.g. Napster, Kazaa ) Unscrupulous agents (e.g. Slammer worm) Uncontrolled growth (e.g. user growth)
![Page 5: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/5.jpg)
Main contributions
• Data reduction: Algorithmic solutions for measurement building blocks Identifying heavy hitters (part 1 of talk) Counting flows or distinct addresses
• Data analysis: Traffic cluster analysis automatically finds the dominant modes of network usage (part 2 of talk) AutoFocus traffic analysis system used by
hundreds of network administrators
![Page 6: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/6.jpg)
Identifying heavy hitters
AnalysisServer
Raw data
Traffic
reports
Router
Fast link
Memory
Network
Identifying heavy hitters with multistage
filters
Network Operator
![Page 7: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/7.jpg)
Why are heavy hitters important?
• Network monitoring: Current tools report top applications, top senders/receivers of traffic
• Security: Malicious activities such as worms and flooding DoS attacks generate much traffic
• Capacity planning: Largest elements of traffic matrix determine network growth trends
• Accounting: Usage based billing most important for most active customers
![Page 8: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/8.jpg)
Problem definition
• Identify and measure all streams whose traffic exceeds threshold (0.1% of link capacity) over certain time interval (1 minute) Streams defined by fields (e.g. destination IP) Single pass over packets Small worst case per packet processing Small memory usage Few false positives / false negatives
![Page 9: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/9.jpg)
Measuring the heavy hitters
• Unscalable solution: keep hash table with a counter for each stream and report largest entries
• Inaccurate solution: count only sampled packets and compensate in analysis
• Ideal solution: count all packets but only for the heavy hitters
• Our solution: identify heavy hitters on the fly Fundamental advantage over sampling –
instead of (M is available memory)
![Page 10: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/10.jpg)
Why is sample & hold better?
uncertainty uncertainty uncertainty
uncertainty
Sample and hold
Ordinary sampling
![Page 11: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/11.jpg)
How do multistage filters work?
Array of counters
Hash(Pink)
![Page 12: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/12.jpg)
How do multistage filters work?
Collisions are OK
![Page 13: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/13.jpg)
How do multistage filters work?
Stream memory
stream1 1
Insert
Reached threshold
stream2 1
![Page 14: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/14.jpg)
Stage 2
How do multistage filters work?
Stream memory
stream1 1
Stage 1
![Page 15: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/15.jpg)
Conservative update
Gray = all prior packets
![Page 16: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/16.jpg)
Conservative update
Redundant
Redundant
![Page 17: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/17.jpg)
Conservative update
![Page 18: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/18.jpg)
Multistage filter analysis
• Question: Find probability that a small stream (0.1% of traffic) passes filter with d = 4 stages * b = 1,000 counters, threshold T = 1%
• Analysis: (any stream distribution & packet order) can pass a stage if other streams in its bucket ≥ 0.9% of traffic at most 111 such buckets in a stage => probability of passing
one stage ≤ 11.1% probability of passing all 4 stages ≤ 0.1114 = 0.015% result tight
![Page 19: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/19.jpg)
Multistage filter analysis results
• d – filter stages• T – threshold• h=C/T, (C capacity)• k=b/h, (b buckets)
• n – number of streams
• M – total memory
Quantity Result
Probability to pass filter
Streams passing
Relative error
![Page 20: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/20.jpg)
Bounds versus actual filtering
Number of stages
Average
probability
of passing
filter for
small
streams
(log scale)
Worst case boundWorst case bound
Zipf boundZipf bound
ActualActual
Conservative updateConservative update
1 2 3 4
1
0.1
0.01
0.001
0.0001
0.00001
![Page 21: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/21.jpg)
Comparing to current solution
• Trace: 2.6 Gbps link, 43,000 streams in 5 seconds
• Multistage filters: 1 Mbit of SRAM (4096 entries)
• Sampling: p=1/16, unlimited DRAM
Average absolute error / average stream size
Stream size Multistage filters Sampling
s > 0.1% 0.01% 5.72%
0.1% ≥ s > 0.01% 0.95% 20.8%
0.01% ≥ s > 0.001% 39.9% 46.6%
![Page 22: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/22.jpg)
Summary for heavy hitters
• Heavy hitters important for measurement processes
• More accurate results than random sampling: . instead of
• Multistage filters with conservative update outperform theoretical bounds
• Prototype implemented at 10 Gbps?
![Page 23: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/23.jpg)
Building block 2, counting streams
• Core idea Hash streams to bitmap and count bits set
Sample bitmap to save memory and scale
Multiple scaling factors to cover wide ranges
• Result Can count up to 100 million streams with an
average error of 1% using 2 Kbytes of memory
Accurate for 16-32 streams
8-15 streams
0-7 streams
![Page 24: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/24.jpg)
Bitmap counting
Does not work if there are too many flows
Hash based on flow identifier
Estimate based on the number of bits set
![Page 25: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/25.jpg)
Bitmap counting
Bitmap takes too much memory
Increase bitmap size
![Page 26: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/26.jpg)
Bitmap counting
Too inaccurate if there are few flows
Store only a sample of the bitmap and extrapolate
![Page 27: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/27.jpg)
Bitmap counting
Must update multiple bitmaps for each packet
Use multiple bitmaps, each accurate over a different range
Accurate if number of flows is 16-32
8-15
0-7
![Page 28: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/28.jpg)
Bitmap counting
16-32
8-15
0-7
![Page 29: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/29.jpg)
Bitmap counting
Multiresolution bitmap
0-32
![Page 30: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/30.jpg)
Future work
![Page 31: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/31.jpg)
Traffic cluster analysis
AnalysisServer
Raw data
Traffic
reports
Router
Fast link
Memory
NetworkNetwork Operator
Part 2: Describing traffic with traffic cluster analysisPart 1: Identifying heavy
hitters, counting streams
![Page 32: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/32.jpg)
Finding heavy hitters not enough
Rank Destination IP Traffic
1 jeff.dorm.bigU.edu 11.9%
2 lisa.dorm.bigU.edu 3.12%
3 risc.cs.bigU.edu 2.83%
Most traffic goes to the dorms …
Rank Dest. network Traffic
1 library.bigU.edu 27.5%
2 cs.bigU.edu 18.1%
3 dorm.bigU.edu 17.8%
Where does the traffic come
from?……
What apps are used? Which
network uses web and which
one kazaa?
• Aggregating on individual fields useful but Traffic reports often not at right granularity
Cannot show aggregates over multiple fields
• Traffic analysis tool should automatically find aggregates over right fields at right granularity
Rank Source IP Traffic
1 forms.irs.gov 13.4%
2 ftp.debian.org 5.78%
3 www.cnn.com 3.25%
Rank Source Network Traffic
1 att.com 25.4%
2 yahoo.com 15.8%
3 badU.edu 12.2%
Rank Application Traffic
1 web 42.1%
2 ICMP 12.5%
3 kazaa 11.5%
![Page 33: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/33.jpg)
Ideal traffic report
Traffic aggregate Traffic
Web traffic 42.1%
Web traffic to library.bigU.edu 26.7%
Web traffic from forms.irs.gov 13.4%
ICMP from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9%
Web is the dominant
applicationThe library is a
heavy user of webThat’s a big flash
crowd!
This is a Denial of Service attack !!
Traffic cluster reports try to give insights into the structure of the traffic mix
![Page 34: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/34.jpg)
Definition
• A traffic report gives the size of all traffic clusters above a threshold T and is:
Multidimensional: clusters defined by ranges from natural hierarchy for each field
Compressed: omits clusters whose traffic is within error T of more specific clusters in the report
Prioritized: clusters have unexpectedness labels
![Page 35: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/35.jpg)
Unidimensional report example
10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14
15 35 30 40 160 110 35 75
Threshold=100Hierarchy
50 70 270 35 75
7530550 70
120 380
500
160 110
270
305
120 380
500
10.0.0.2/31 10.0.0.4/31 10.0.0.8/31 10.0.0.10/31
10.0.0.0/30 10.0.0.4/30 10.0.0.8/30
10.0.0.0/29 10.0.0.8/29
10.0.0.0/28
10.0.0.14/31
10.0.0.12/30
AI Lab
2nd
floor
CS Dept
![Page 36: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/36.jpg)
270
120
500
305
380
160 110
Unidimensional report example
10.0.0.8 10.0.0.9
10.0.0.0/29 10.0.0.8/29
10.0.0.8/31
10.0.0.8/30
10.0.0.0/28
120 380
160 110
Compression
305-270<100
380-270≥100
Source IP Traffic
10.0.0.0/29
120
10.0.0.8/29
380
10.0.0.8 160
10.0.0.9 110
Rule: omit clusters with
traffic within error T of
more specific clusters in
the report
![Page 37: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/37.jpg)
Multidimensional structure
All traffic All traffic
US EU
CA NY FR RU
Web Mail
Source net Application
All traffic
EU
RU
RU Mail
RU Web
![Page 38: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/38.jpg)
AutoFocus: system structure
Trafficparser
Web basedGUI
Cluster miner
Grapher
Packet header trace / NetFlow data
categories
names
![Page 39: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/39.jpg)
Traffic reports for weeks, days, three
hour intervals and half hour
intervals
![Page 40: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/40.jpg)
![Page 41: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/41.jpg)
Colors – user defined traffic categories
Separate reports for each category
![Page 42: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/42.jpg)
Analysis of unusual events
• Sapphire/SQL Slammer worm Found worm port and protocol automatically
![Page 43: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/43.jpg)
Analysis of unusual events
• Sapphire/SQL Slammer worm Identified infected hosts
![Page 44: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight](https://reader036.vdocument.in/reader036/viewer/2022081511/56649dc85503460f94abd51b/html5/thumbnails/44.jpg)
Related work
• Databases [FS+98] Iceberg Queries Limited analysis, no conservative update
• Theory [GM98,CCF02] Synopses, sketches Less accurate than multistage filters
• Data Mining [AIS93] Association rules No/limited hierarchy, no compression
• Databases [GCB+97] Data cube No automatic generation of “interesting” clusters