by damiano bolzoni, sandro etalle, pieter h. hartel --james o’reilly presenting

PANACEA

by Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel

--James O’Reilly presenting

Intrusion Detection Systems (IDS)

Signature-based IDS (SBS)Matches activity/payload to known attacks.

The attacks are classified simultaneously with detection.

Not able to recognize non-trivial variations of known attacks

Useless against new, zero-day attacksWith classification, security personnel can

automate response and prioritize alerts

Intrusion Detection Systems (IDS)

Anomaly-based IDS (ABS)Recognizes suspicious activity, even novel

attacks, but cannot classify the activity. High false positive rate, which is a burden to

support staffWithout classification, prioritization and alert

response automation are impossible

Problem Statement:

Anomaly-Based IDSes appear to have more potential than signature-based IDSes because they can handle novel cases but are handicapped since they are unable to relate to security personnel anything about, e.g., a packet other than that it is “odd.”

Panacea’s Goal

To accurately classify anomalies based on payload information from anomaly-based IDSes into different attack classes.

The ability of an ABS to identify attacks will finally be paired with a system that can efficiently classify attacks as they happen, making ABSes far less costly in man-hours to use.

Training and Classification

Alert Information Extractor“Boils down” an alert into a Bloom Filter representation

that the classification engine can analyze. Goes through two stages:

1. Building the n-gram bitmap

2. Computing the Bloom Filter

3. (Only During training) It will additionally pass along classification information (labels).

The more samples the Alert Classification Engine has, the more accurately it can classify alerts. As we will see, storage for all the alert payloads is impractical.

Representing the data features

There are 256length possible messages of length bytes.

Problem for classification is what features to pick – how to represent the payload without losing “meaning”Each feature has a space and time cost,

especially during training.Too few features corresponds to a lack of

resolution and the classifier’s task is likely impossible or hampered

N-grams

The information in the payload is represented using binary n-gram analysis. n , the n-gram order, represents the number of adjacent symbols that are analyzed.

The feature is the presence or absence of an n-gram in the payload and is stored in a bitmap

n-gram bitmap size is on the order of 256n

Bloom Filter

The size of 3-gram bitmap is about 2MB. A 5-gram is about 128GB.

A Bloom Filter offers an aggressive compression of the n-gram features at the risk of false positives when reading the data.

The authors state that a 10KB space would be acceptable in the 5-gram case.

James

why 2MB...

Bloom Filter

The binary Bloom Filter data structure is basically a vector of some length and is used for determining set membership.

Insertion: hash with different hash functions (with a range of Bloom Filter vector length) and mark the positions that are hashed to.

Membership: hash the value with the chosen hash functions and look up in vector- if all positions are marked then present*, otherwise absent.

Inserting into a Bloom Filter

“The error rate can be decreased by increasing the number of hash transforms and the space allocated to store the table.[1]”

Collisions in the Map

A collision will occur with the above probability, where l is the size of the space, k is the number of hash functions, and n is the number of insertions.

Alert Classification EngineThis Engine has essentially two modes:

training and classification.

Training is when the classifier (SVM or RIPPER) is learning how to classify the attacks. It does this with labeled Bloom Filter data(supervised learning).

Once trained, the classifier can be given unlabelled Bloom Filter data and classify it.

Training and Classification

Accuracy and Training SetThe accuracy of the classifier is dependent on training

set size and, of course, its quality. Quality will be effected by the way the training data is labeled, more shortly. The training set needs to be fairly large (the larger the better but there are diminishing returns).

Bolzoni et al. chose SVM and RIPPER for their accuracy but they are non-iterative learners: to update with new samples they must essentially add the samples to the original data and completely retrain. Therefore, the data must be as compact as possible without destroying the distinguishing features of the payloads.

Classifier

A classifier takes input and classifies it as a member of a class

A binary classifier takes input and decides essentially whether it’s a member of a class or not.

Training a supervised-learning classifier involves taking labeled data and then minimizing the error on the training data using whatever sort of implementation the classifier is using.

James

reword....

SVM

Training: It takes its sample set and plots it in a high dimensional space using a non-linear function and then divides its data with a hyperplane (a plane in a higher dimensional space). A signed distance from a plane is the metric to evaluate class membership (planes can have a positive or negative faces).

Multiple classes are essentially done by adding multiple hyperplanes.

RIPPER RIPPER is a rule-based classifier. It begins

with an empty growing set and adds rules until there is no error on the growing set.

Handles multiple classes by identifying least common set and then the second-least common…

Has an optimization step to reduce rule set size

Labeling Alerts (input to the Alert information Extractor) Three methods:

1. Automatically: use the input from an SBS

2. Semi-automatic: use the SBS input and add data from an ABS with manual labeling

3. Manual: All alerts are manually classified

Test 1: Automatic

DSa: Data Set a is 3200 automatically generated Snort alerts (SBS) triggered with vulnerability assessment tools in 14 classes. 4 classes were excluded because they had fewer than 10 samples.

n-gram length vs. accuracy

Selected Classes: Classifier vs. Sample Size

Test 2: Web attacks semi-automatic

DSb: as Dsa but focused on web attacks alone with addition of some Milw0rm attacks. 1400 alerts all manually classified.

Two most common

semi-

James

Justification Given is Snort lacks fine-grained web attack classes

Live attacks

DSc: Manually classified alerts from university server, no injection but alerted from ABS Poseidon and Sphinx. 100 alerts over 2 weeks.

Panacea trained on DSb and is tested against the 100 ABS alerts.

Novelty: SVM vs. Ripper

Extra Buffer Overflows were created by mutating known ones with the Sploit framework.

With Confidence Evaluation

Results

Bolzoni et al. present an attack payload classifier for anomaly-based intrusion detection systems.

Also, there exists a framework in this paper to add other classifiers and this framework can be extended to hybrid responses (SVM early, RIPPER if sample size over some amount, SVM for high-risk cases…)

References. Questions?

[1] J. Bluestein, A. El-Maazawi. “Bloom Filters- A Tutorial, Analysis, and Survey”, Technical Report CS-2002-10. Faculty of Computer Science, Dalhousie Univ., Canada.

by damiano bolzoni, sandro etalle, pieter h. hartel --james o’reilly presenting

Documents

classification slide

ngram features

ngram order

impossible slide

hampered slide

bitmap ngram bitmap

binary ngram analysis

bloom filter representation