building a time machine for efficient recording and retrieval of high-volume network traffic stefan...
TRANSCRIPT
Building a Time Machine for
Efficient Recording and Retrieval of High-Volume Network TrafficStefan Kornexl1, Vern Paxson2, Holger Dreger1,
Anja Feldmann1, Robin Sommer1
1TU München, 2ICSI/LBNL
Internet Measurement Conference (IMC) 2005
2007/12/21 Speaker: Li-Ming Chen 2
Reference
Stenfan Kornel, Vern Paxson, Holger Dreger, Anja Feldmann, Robin Sommer, “Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic,” 5th ACM IMC 2005.
“High-Performance Packet Recording for Network Intrusion Detection,” master thesis by Stefan Kornexl, 2005.
Time Machine webpage http://www.net.t-labs.tu-berlin.de/research/tm/
2007/12/21 Speaker: Li-Ming Chen 3
Outline
Motivation and Goals
Feasibility Study (trace-driven simulation)
System Architecture
Performance Evaluation
Conclusion and Comments
2007/12/21 Speaker: Li-Ming Chen 4
Motivation
The availability of packet recording is considered a big benefit for network security monitoring Security forensics
Determining how an attacker compromised a given host Network trouble-shooting
Inspecting the precursors to a fault after the fault Event correlation
NIDS could analyze past events that are not considered “interesting” until more recently seen traffic hinted at their relevance
2007/12/21 Speaker: Li-Ming Chen 5
Problems
Looking at raw packets (not only headers but full contents)
Storage constrains In many operational environments, it’s infeasible to capture the
entire traffic stream due to the enormous volume of the traffic
Problems for data filtering Hard to decide beforehand what context will turn out to be
relevant retrospectively to investigate incidents Filtering still becomes technically problematic in high speed
network (n TB per day)
Data retrieval is like finding needle in a haystack It’s time-consuming and cumbersome
2007/12/21 Speaker: Li-Ming Chen 6
Related Work
Brute-force bulk-recording Only in low volume environments
Record those packets that trigger alerts Do not support retrospective analysis of a problematic
host’s earlier activity
Sampling – might loose important evidence Data abstraction – provide less information
2007/12/21 Speaker: Li-Ming Chen 7
Objection
Design and implement a packet recording system “Time Machine” Use dynamic packet filtering and buffering to enable
effective recording of large traffic stream Nearly complete historic data for several days
Allowing to conveniently “travel back in time”
Application: E.g., a forensic tool – to extract detailed past information
about unusual activities once they are detected
2007/12/21 Speaker: Li-Ming Chen 8
The Approach
Observation (key insight): “Heavy-tailed” distribution in network traffic
Most network connections are quite short
Only a small number of large connections accounting for the bulk of the total volume
Compromising is at the beginning of most attacks For forensics and trouble-shooting applications the
beginning of a large connection contains the most significant information
2007/12/21 Speaker: Li-Ming Chen 9
The Approach (cont’d)
Exploit the “heavy-tailed” nature to partition the traffic stream into a small subset of high interest vs. a large remainder of low interest Then record the small subset and discard the rest
Cutoff limit, N: For every connection, it buffers up to the first N bytes
of traffic Greatly reduce the traffic we must buffer Retain full context for small connections and the
beginning for large connections
2007/12/21 Speaker: Li-Ming Chen 10
Design Goals for the Time Machine Provide raw packet data
Buffer traffic comprehensively
Prioritize traffic
Automated resource management
Efficient and flexible retrieval
Suitable for high-volume environments using commodity hardware
2007/12/21 Speaker: Li-Ming Chen 11
Outline
Motivation and Goals
Feasibility Study (trace-driven simulation)
System Architecture
Performance Evaluation
Conclusion and Comments
2007/12/21 Speaker: Li-Ming Chen 12
Environments
MWN Munich Scientific Research Network in Munich, Germany About 50,000 hosts, 2 TB/day 15-20% FTP traffic 350 Mbps (68 Kpps) at busy-hour
LBNL Lawrence Berkeley National Laboratory in California, USA About 9,000 hosts & 4,00 users 320 Mbps (37 Kpps) at busy-hour
NERSC National Energy Research Scientific Computing Center About 600 hosts & 2,000 users (dominated by large transfers) 260 Mbps (43 Kpps) at busy hour
2007/12/21 Speaker: Li-Ming Chen 13
Datasets
Connection-level summaries (1 week) collected by Bro NIDS MWN – 355 million connections (from 2004/10/18) LBNL – 22 million connections (from 2005/2/7) NERSC – 4 million connections (from 2005/4/29)
These logs capture the nature of their environments but with a relatively low volume compared to full packet-level data
Use packet-buffer model to simulate packet-level communication and evaluate the memory requirements of a Time Machine
2007/12/21 Speaker: Li-Ming Chen 14
Heavy-tailed Distribution and the Cutoff
(log-log scaled)
Cutoff = 20 KB≈ 90% connections(record)
≈ 10% connections(discard)
12% 14%
15%
Their bytes:(discard)
NERSC 99.86%LNBL 96%MWN 87%
2007/12/21 Speaker: Li-Ming Chen 15
Evaluate the Memory Requirements Eviction time, Te :
How long the buffer stores each connection’s data (The goal) aim for a value of Te on the order of days
rather than minutes
Changing the cutoff N and the eviction time Te to evaluate the efficiency (feasibility) of a Time Machine Results: using a cutoff of 10-20 KB, buffering several
days of traffic is practical
2007/12/21 Speaker: Li-Ming Chen 16
Required Memory for LBNL
64 GB68 GB
Increase the duration of data availability by a factor of 32 (3h vs. 4d)
Stop to increase after 4 days, since the constrain of eviction time Te
5th day
2007/12/21 Speaker: Li-Ming Chen 17
Required Memory for NERSC
344 GB
14.9 GB
NERSC has large proportion of high-volume traffic(14% connections -> 99.86% bytes)
• Without a cutoff, the volume is spiky
• Te only is helpless for volume because of the intermittent bursts of traffic
2007/12/21 Speaker: Li-Ming Chen 18
Required Memory for MWNMWN has lower fraction of bytes in the larger connections
(15% connections -> 87% bytes)The gain from the cutoff is not quite as large, likely due tothe larger fraction of HTTP traffic
2007/12/21 Speaker: Li-Ming Chen 19
Outline
Motivation and Goals
Feasibility Study (via trace-driven simulation)
System Architecture
Performance Evaluation
Conclusion and Comments
2007/12/21 Speaker: Li-Ming Chen 20
Time Machine System Architecture 4 Main Functions :
1. buffering traffic using a cutoff
3. providing flexible retrieval of subsets of the packets
4. enabling customization
2. migrating the buffered packets to disk and managing the associated storage
2007/12/21 Speaker: Li-Ming Chen 21
Two-thread Architecture
Separates user interaction from recording to ensure that packet capture has higher priority than packet retrieval
2007/12/21 Speaker: Li-Ming Chen 22
Packet Capture
The capture unit Receive packets from network tap and passes them o
n to the classification unit
Use libpcap packet capture library to collect and store each packet’s full content and capture timestamp libpcap can specify a kernel-level BPF (BSD Packet Fil
ter) capture filter to discard “uninteresting” traffic as early as possible
2007/12/21 Speaker: Li-Ming Chen 23
Classification
The classification unit Divide the incoming packet stream into user-defined
classes Assign packets to different storage containers based on their classes
Responsible for monitoring the cutoff with the help of the connection tracking unit Connection tracking unit keeps per connection statistics and checks if
the connection the packet belongs to has exceeded its cutoff threshold
An example of the “telnet” class:name BPF filter (rule)
priority cutoff
memory and disk buffer size
2007/12/21 Speaker: Li-Ming Chen 24
Storage Containers
The architecture supports customization by splitting the overall storage into several storage containers Each storage container is responsible for storing a
subset of packets within the resources (memory/disk) According to the user defined classes
RAM and disk buffers are implemented as two ring buffers Packets evicted from the RAM buffer are migrated to the
disk buffer And eventually be deleted
2007/12/21 Speaker: Li-Ming Chen 25
Indexing
For efficient retrieval Use an index across all packets stored in all storage containers Each index manages a list of time intervals for every unique key
value Update [Tstart, Tend] for each key (each incoming packets)
The time intervals provide information on whether packets with that key value are available in a given storage container and at what starting timestamp Just scan linearly through the intervals it gets from the index
Multiple indexes Support any number of indexes over an arbitrary set of protocol h
eader fields
2007/12/21 Speaker: Li-Ming Chen 26
Query Processing
Provides a flexible language to express queries for subsets of the packets
Each query consists of a logical combination of time ranges, keys, and an optional BPF filter
1. Check index, get the time range of the query.
2. Locate the time ranges in the storage containers using binary search
3. Scanning all packets in the identified time ranges and checking if they match the query
4. Writing the results to a tcpdump trace file on disk
2007/12/21 Speaker: Li-Ming Chen 27
User Interface
Allows the user to configure the recording parameters Classification rules, cutoff, storage management, inde
xing Δt, etc. Issues queries to the query processing unit to ret
rieve subsets of the recorded packets
2007/12/21 Speaker: Li-Ming Chen 28
Outline
Motivation and Goals
Feasibility Study (via trace-driven simulation)
System Architecture
Performance Evaluation
Conclusion and Comments
2007/12/21 Speaker: Li-Ming Chen 29
Evaluation in LBNL Configuration:
3 classes, each witha 20KB cutoff:
• TCP 90GB • UDP 30GB • Others 10GB
Retention:The distance back in time to which wecan travel at any particular moment
Increases after the Time Machine startsuntil the disk buffers have filled
Correlates with the incoming bandwidth for each class and its variations due to diurnal and weekly effects
2007/12/21 Speaker: Li-Ming Chen 30
Evaluation
In LBNL 98% of the traffic gets discarded The remainder imposes an a average (maximum) rate of 300 KB/
s (2.6 MB/s) Over the 2 weeks of operation libpcap reported only 0.016% of all
packets dropped In MWN
85% of the traffic gets discarded Average (maximum) rate of 3.5 MB/s (13.9 MB/s)
larger volume of HTTP traffic Issues: need to more aggressively exploit the classification and c
utoff mechanisms to appropriately manage the large fraction of HTTP traffic
2007/12/21 Speaker: Li-Ming Chen 31
Conclusion
A concept of a Time Machine for efficient network packet recording and retrieval is proposed Relies on the “heavy-tailed” nature of network traffic Record most connections in their entirety and skip the bulk of the
total volume
Time Machine Can buffer several days of raw high-volume traffic using commod
ity hardware Provides an efficient query interface Automatically manages its available strorage
Using a trace-driven simulation and real experience to demonstrate the effectiveness