sigcomm 2002 new directions in traffic measurement and accounting focusing on the elephants,...
TRANSCRIPT
SIGCOMM 2002
New Directions in Traffic New Directions in Traffic Measurement and AccountingMeasurement and Accounting
Focusing on the Elephants, Ignoring the Mice
Cristian Estan and George Varghese
University of California, San Diego
SIGCOMM 2002
Talk outlineTalk outline
• Problem definition
• Sample and hold
• Multistage filters
• Validation, measurements
• Conclusions
SIGCOMM 2002
Traffic analysis todayTraffic analysis today
Router
Fast link
Measurement module
Sampled packets
Workstation
Large raw data Collection and analysis software
Concise analysis results
Offline analysis
SIGCOMM 2002
Our research agendaOur research agenda
Router
Real-time analysis
•Is it doable?
•Is it better?Fast link
Measurement module
Concise analysis results
SIGCOMM 2002
What is traffic analysis What is traffic analysis used for?used for?
• Network planning: need to know traffic between pairs of networks (traffic matrix)
• Accounting: usage based billing
• Detecting DoS attacks: flood attacks
• Application characterization: breaking up the traffic based on port numbers
• …
SIGCOMM 2002
Common abstractionsCommon abstractions
• Packets are grouped together into streams based on header fieldsTraffic matrix – by source and destination ASDoS attacks – by destination IP address
• Measuring large streams (this paper)
• Estimating the number of active streams (poster)
• …
SIGCOMM 2002
Why is measuring streams hard?Why is measuring streams hard?
• Cheap memories (DRAM) are too slow to count all packets
• Fast memories (SRAM) are too small to keep counters for all streams
• Opportunity: elephants matter, mice don’t
• Problem: usually we don’t know in advance which streams are large
SIGCOMM 2002
Problem definitionProblem definition
• Given a fixed definition for streams, measure large streams accuratelyLarge = above 1% of link capacity over a 1
minute interval
• AssumptionsMice don’t matterAccuracy of results important
SIGCOMM 2002
Talk outline Talk outline
• Problem definition
• Sample and hold
• Multistage filters
• Validation, measurements
• Conclusions
SIGCOMM 2002
How does sample and hold How does sample and hold work?work?
stream memory
stream1 1
SampleInsert
SIGCOMM 2002
How does sample and hold How does sample and hold work?work?
stream memory
stream1 1stream1 2Update
SIGCOMM 2002
How does sample and hold How does sample and hold work?work?
stream memory
stream1 2
stream2 1
Sample
Insert
SIGCOMM 2002
Why is sample & hold better?Why is sample & hold better?
uncertainty uncertainty uncertainty
uncertainty
Sample and hold
Ordinary sampling
SIGCOMM 2002
• Comparing the relative error of the estimate for a stream at 1/F of the link bandwidth
• Memory limited to M entries
How much better is it?How much better is it?
MeasureOrdinary sampling
Sample and hold
Error √ F/M F/M
Memory accesses
1/S 1
SIGCOMM 2002
Talk outline Talk outline
• Problem definition
• Sample and hold
• Multistage filters
• Validation, measurements
• Conclusions
SIGCOMM 2002
Multistage filtersMultistage filters
Characteristics: Characteristics: • No large stream is ever omitted
• Very few entries are used by small streams
• Better performance but implementation and tuning is more complex
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memoryArray of counters
Hash(Pink)
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memoryArray of counters
Hash(Green)
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memoryArray of counters
Hash(Green)
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memory
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memoryCollisions are OK
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memory
stream1 1
Insert
Reached threshold
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memory
stream1 1
SIGCOMM 2002
How do multistage filters work?How do multistage filters work?
stream memory
stream1 1
stream2 1
SIGCOMM 2002
Stage 2
How do multistage filters work?How do multistage filters work?
stream memory
stream1 1
Stage 1
SIGCOMM 2002
Conservative updateConservative update
Gray = all prior packets
SIGCOMM 2002
Conservative updateConservative update
Redundant
Redundant
SIGCOMM 2002
Conservative updateConservative update
SIGCOMM 2002
Talk outline Talk outline
• Problem definition
• Sample and hold
• Multistage filters
• Validation, measurements
• Conclusions
SIGCOMM 2002
ValidationValidation
• Analytical evaluation
• Comparison of analytical results to measured performance
• Comparison of full measurement devices using different algorithms
SIGCOMM 2002
On traces, algorithms much On traces, algorithms much better than analysis predictsbetter than analysis predicts
Number of stagesNumber of stages
PercentagePercentage
of smallof small
streams streams
passingpassing
filterfilter
(log scale)(log scale)
TheoryTheory
ZipfZipf
ActualActual
ConservativeConservative
updateupdate
SIGCOMM 2002
Measurement resultsMeasurement results
• Setup: OC48 trace, 100,000 TCP flows, 5 second intervals, ordinary sampling - unlimited memory, sampling 1 in 16 our algorithms - 1Mbit, adapting parameters to keep it around 90% full
• Large streams (above 0.1%): ordinary sampling has an error of 9% sample and hold 0.075%, multistage filter 0.037%
SIGCOMM 2002
Talk outline Talk outline
• Problem definition
• Sample and hold
• Multistage filters
• Validation, measurements
• Conclusions
SIGCOMM 2002
Our contributionsOur contributions
• Abstraction: Real-time packet analysis abstractions can help
systematize router implementations. While the notion of elephants and mice is inherent in
earlier work, we abstracted measurement of large streams - it can be used by many applications.
SIGCOMM 2002
Our contributions (2)Our contributions (2)
• Algorithms: Sample and hold is a simple and efficient algorithm
for identifying and measuring large streams. Multistage filters with conservative update perform
better but are more complex.Both can be used for real-time as well as offline
analysis.
SIGCOMM 2002
Our contributions (3)Our contributions (3)
• Validation: Theoretical results that make no assumptions on
traffic distribution Simulations on traces are orders of magnitude betterPreliminary hardware design (John Huber) indicates
feasibility at OC192 speeds
SIGCOMM 2002
Thank you!Thank you!
SIGCOMM 2002
Optimizations to sample and holdOptimizations to sample and hold
• Preserving entries: Keep large entries from one measurement interval to the next Reduces error by a factor of 6
• Early removal: Quickly remove entries that do not accumulate much traffic Reduces memory usage by 25%
SIGCOMM 2002
Optimizations to multistage filtersOptimizations to multistage filters
• Preserving entries: Keep large entries from one measurement interval to the next Reduces error by a factor of 5
• Shielding: Large streams identified in previous intervals don’t pass through the filter Reduces memory usage by up to 70%