botfinder : finding bots in network traffic without deep packet inspection
DESCRIPTION
BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection. F. Tegeler, X. Fu (U Goe ), G. Vigna, C. Kruegel (UCSB) . Motivation. Sophisticated type of malware: Bots Multiple bots under single control botnet - PowerPoint PPT PresentationTRANSCRIPT
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection
F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
CoNEXT 2012
Motivation Sophisticated type of malware: Bots
Multiple bots under single control botnet Distinct characteristics:
command and control (C&C) channel
Threats raised by bots: Spam Information theft (e.g., credit card data) Identity theft Click fraud Distributed denial of service attacks (DDoS)
C&C
Victim hosts
$2M-$600M revenue estimated for single botnet
2/24
CoNEXT 2012
Challenge
Complementary approach: Network based Vertical correlation (single end host) (Rishi,
BotHunter, Wurzinger et al., …) Typical behavior (SPAM, DDos traffic) Anomaly detection (Giroire et al.) Packet analysis: HTTP structure, payloads,
typical signatures Horizontal correlation (multiple end hosts)
(BotSniffer, BotMiner, TAMD…): Two or more hosts do the same malicious stuff
How to detect bot infections? Classically: End host – Anti Virus Scanner But: Requires installation on every machine
3/24
CoNEXT 2012
Challenge and Solution Approach Existing vertical: Typically relies on scanning,
spam, DDoS traffic and requires packet inspection. Existing horizontal: Requires multiple hosts in
single domain to be infected. Also triggered by noisy activity (e.g., BotMiner)
Contribution: Vertical detection of single bot infections without packet inspection! Botmaster establishes C&C connections frequently to
disseminate orders. C&C connections show patterns. Use these statistical properties of C&C communication!
Core assumption: Periodic behavior!
4/24
CoNEXT 2012
Methodology Basic machine learning approach:
Learn about bot behavior: Training phase (a)
Use learned behavior: Detection phase (b)
Training: Observe malware in controlled
environment Extract flows and build traces Perform statistical analysis to obtain “features” Create models to describe malware
5/24
CoNEXT 2012
Methodology – Detection Phase Detection:
Obtain traffic Perform analysis analog to training Compare statistical features of the
traffic with models
During the whole process: No deep packet inspection!
6/24
CoNEXT 2012
Methodology – Details Analysis performed on flows Flow is a connection from A
to B: Source IP address Destination IP address Source port Destination port Transport protocol ID Start time Duration of connection Number of bytes Number of packets
This information is easy to obtain in real-world environments!
Example: NetFlow
7/24
CoNEXT 2012
Methodology – Details cont’d Trace: Chronologically
ordered sequence of flows. Represents long term
communication behavior!
Example for two dimensions: time and duration
8/24
CoNEXT 2012
Distinguishing Characteristics Bot traffic is more regular than normal, benign
traffic!The lower the bar, the more periodic.
9/24
CoNEXT 2012
Methodology – Features Use statistical features to
describe trace! Average time between two flows. Average duration of flows. Average number of source bytes. Average number of destination bytes. A Fourier transform to detect underlying
communication frequencies. More robust than simple averaging.
10/24
CoNEXT 2012
24min
Methodology – Models Example scenario:
Multiple binary versions of the samebot family generated traces
Example: time interval feature:
“Intervals of 8, 20, or 210 minutes are typical for this bot.”
Clusters with low standard deviation are trustworthy representations of malware behavior
Drop very small (one-element) clusters
20min
18min
8min
7.5min
17min
22min
9min
230min
190min
Feature clustering…
20min 8.2min210minCluster centroids
912min
11/24
CoNEXT 2012
Methodology – Model Matching Compare a trace to the cluster
centers of a malware family model: 1. If trace feature “hits” a model:
Increase scoring value based on clusterquality
2. Take model with highest scoringvalue
3. If scoring value > threshold: Consider model matched
Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.)
12/24
CoNEXT 2012
Evaluation Method is implemented in BotFinder
Six representative malware families
Dataset LabCapture: 2.5 months of lab traffic with 60 machines Full traffic capture – allows verificiation Should contain benign traffic only
Dataset ISPNetflow: one month of NetFlow data from large network Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of
traffic. No ground truth but possibility to compare to blacklisted IP
addresses and judgment of usability.
13/24
CoNEXT 2012
Execution: Split the ground truth malware
dataset randomly into a training set and a detection set
Mix the detection set with all traces from the LabCapture dataset
Train BotFinder on the training set Run BotFinder against the
detection set
Result summary: 77% detection rate with low false
positives (1 out of 5 million traces)
Evaluation – Cross ValidationTraining
data
Training set
Detection set
Lab-Capture
Train DetectRepeat experiment 50
timesper acceptance threshold
14/24
CoNEXT 2012
Evaluation – Cross Validation
15/24
CoNEXT 2012
Evaluation – Comparison to BotHunter BotHunter is an optimized Snort Intrusion Detection
System. It requires packet inspection and leverages anomaly detection.
Many false positives for BotHunter, typically raised by IRC activity or binary downloads.
Detection Results: BotFinder Detection Rate: 77.5% BotHunter Detection Rate: 10%
BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives.
*
*: http://www.bothunter.net
Experimental setup not
reproducing elements
crucial to BotHunter?
16/24
CoNEXT 2012
Evaluation - ISPNetFlow Challenging to analyze as minimal information
(only internal IP ranges) is available 542 traces (from >1 billion traces) are
identified by BotFinder to be malicious On average 14.6 alerts per day
17/24
CoNEXT 2012
Speed is sufficient for large networks: 3min for 15M NetFlow records (~15min of ISPNetFlow,
800MB filesize) Processing is dominated by feature extraction
Easy to parallelize
Detailed IP address investigation of raised alarms: Comparison of external IPs with publicly available
blacklists* Result: 56% of all IPs are known to be malicious!
The “false positives” show a large cluster of connections to Apple With whitelisted Apple: 61% of all raised alerts connect to known
malicious pages Strong support that BotFinder works!
Evaluation ISP NetFlow
*=rbls.org18/24
CoNEXT 2012
Bot Evolution Botmasters may try to evade detection by
changing communication patterns: Introduction of randomized intervals Introduction of large gaps between flows IP or domain flux (fast changing C&C servers)
Randomization impact: Randomizing individual
features does not significantly impactdetection
Lower limit!
19/24
CoNEXT 2012
FFT Peak Detection with Gaps
20/24
CoNEXT 2012
Anti-Domain Flux Problem: Fast C&C-Domain/IP changes
Problem: BotFinder can’t create a sufficiently long trace
Idea: Look at each source IP and compare all connections with
each other When two connections look very similar, combine them to
one! Inherently horizontal correlation per source IP!
Change of IP address
Trace “breaks”
Subtrace 1: A to C&C IP 1Subtrace 2: A to C&C IP 2
21/24
CoNEXT 2012
How can one check that it is working? Split of real C&C traces and random other, long traces (from real
traffic). Does BotFinder recombine them?
“Low” overhead: 85% increase in the ISPNetFlow.
Large distance! Good!
Additional Pre-Processing
22/24
CoNEXT 2012
Conclusion
High detection rates - nearly 80% - with low false positives and no need for packet inspection!
BotFinder shows better results than BotHunter.
61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host!
BotFinder is robust against potential evasion strategies.
23/24
CoNEXT 2012
Questions
Thank you for your attention!
Any questions?
24/24