by damiano bolzoni, sandro etalle, pieter h. hartel --james o’reilly presenting

29
PANACEA by Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Post on 19-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

PANACEA

by Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel

--James O’Reilly presenting

Page 2: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Intrusion Detection Systems (IDS)

Signature-based IDS (SBS)Matches activity/payload to known attacks.

The attacks are classified simultaneously with detection.

Not able to recognize non-trivial variations of known attacks

Useless against new, zero-day attacksWith classification, security personnel can

automate response and prioritize alerts

Page 3: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Intrusion Detection Systems (IDS)

Anomaly-based IDS (ABS)Recognizes suspicious activity, even novel

attacks, but cannot classify the activity. High false positive rate, which is a burden to

support staffWithout classification, prioritization and alert

response automation are impossible

Page 4: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Problem Statement:

Anomaly-Based IDSes appear to have more potential than signature-based IDSes because they can handle novel cases but are handicapped since they are unable to relate to security personnel anything about, e.g., a packet other than that it is “odd.”

Page 5: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Panacea’s Goal

To accurately classify anomalies based on payload information from anomaly-based IDSes into different attack classes.

The ability of an ABS to identify attacks will finally be paired with a system that can efficiently classify attacks as they happen, making ABSes far less costly in man-hours to use.

Page 6: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Training and Classification

Page 7: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Alert Information Extractor“Boils down” an alert into a Bloom Filter representation

that the classification engine can analyze. Goes through two stages:

1. Building the n-gram bitmap

2. Computing the Bloom Filter

3. (Only During training) It will additionally pass along classification information (labels).

The more samples the Alert Classification Engine has, the more accurately it can classify alerts. As we will see, storage for all the alert payloads is impractical.

Page 8: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Representing the data features

There are 256length possible messages of length bytes.

Problem for classification is what features to pick – how to represent the payload without losing “meaning”Each feature has a space and time cost,

especially during training.Too few features corresponds to a lack of

resolution and the classifier’s task is likely impossible or hampered

Page 9: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

N-grams

The information in the payload is represented using binary n-gram analysis. n , the n-gram order, represents the number of adjacent symbols that are analyzed.

The feature is the presence or absence of an n-gram in the payload and is stored in a bitmap

n-gram bitmap size is on the order of 256n

Page 10: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Bloom Filter

The size of 3-gram bitmap is about 2MB. A 5-gram is about 128GB.

A Bloom Filter offers an aggressive compression of the n-gram features at the risk of false positives when reading the data.

The authors state that a 10KB space would be acceptable in the 5-gram case.

James
why 2MB...
Page 11: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Bloom Filter

The binary Bloom Filter data structure is basically a vector of some length and is used for determining set membership.

Insertion: hash with different hash functions (with a range of Bloom Filter vector length) and mark the positions that are hashed to.

Membership: hash the value with the chosen hash functions and look up in vector- if all positions are marked then present*, otherwise absent.

Page 12: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Inserting into a Bloom Filter

“The error rate can be decreased by increasing the number of hash transforms and the space allocated to store the table.[1]”

Page 13: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Collisions in the Map

A collision will occur with the above probability, where l is the size of the space, k is the number of hash functions, and n is the number of insertions.

Page 14: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Alert Classification EngineThis Engine has essentially two modes:

training and classification.

Training is when the classifier (SVM or RIPPER) is learning how to classify the attacks. It does this with labeled Bloom Filter data(supervised learning).

Once trained, the classifier can be given unlabelled Bloom Filter data and classify it.

Page 15: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Training and Classification

Page 16: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Accuracy and Training SetThe accuracy of the classifier is dependent on training

set size and, of course, its quality. Quality will be effected by the way the training data is labeled, more shortly. The training set needs to be fairly large (the larger the better but there are diminishing returns).

Bolzoni et al. chose SVM and RIPPER for their accuracy but they are non-iterative learners: to update with new samples they must essentially add the samples to the original data and completely retrain. Therefore, the data must be as compact as possible without destroying the distinguishing features of the payloads.

Page 17: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Classifier

A classifier takes input and classifies it as a member of a class

A binary classifier takes input and decides essentially whether it’s a member of a class or not.

Training a supervised-learning classifier involves taking labeled data and then minimizing the error on the training data using whatever sort of implementation the classifier is using.

James
reword....
Page 18: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

SVM

Training: It takes its sample set and plots it in a high dimensional space using a non-linear function and then divides its data with a hyperplane (a plane in a higher dimensional space). A signed distance from a plane is the metric to evaluate class membership (planes can have a positive or negative faces).

Multiple classes are essentially done by adding multiple hyperplanes.

Page 19: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

RIPPER RIPPER is a rule-based classifier. It begins

with an empty growing set and adds rules until there is no error on the growing set.

Handles multiple classes by identifying least common set and then the second-least common…

Has an optimization step to reduce rule set size

Page 20: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Labeling Alerts (input to the Alert information Extractor) Three methods:

1. Automatically: use the input from an SBS

2. Semi-automatic: use the SBS input and add data from an ABS with manual labeling

3. Manual: All alerts are manually classified

Page 21: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Test 1: Automatic

DSa: Data Set a is 3200 automatically generated Snort alerts (SBS) triggered with vulnerability assessment tools in 14 classes. 4 classes were excluded because they had fewer than 10 samples.

Page 22: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

n-gram length vs. accuracy

Page 23: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Selected Classes: Classifier vs. Sample Size

Page 24: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Test 2: Web attacks semi-automatic

DSb: as Dsa but focused on web attacks alone with addition of some Milw0rm attacks. 1400 alerts all manually classified.

Two most common

semi-

James
Justification Given is Snort lacks fine-grained web attack classes
Page 25: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Live attacks

DSc: Manually classified alerts from university server, no injection but alerted from ABS Poseidon and Sphinx. 100 alerts over 2 weeks.

Panacea trained on DSb and is tested against the 100 ABS alerts.

Page 26: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Novelty: SVM vs. Ripper

Extra Buffer Overflows were created by mutating known ones with the Sploit framework.

Page 27: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

With Confidence Evaluation

Page 28: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Results

Bolzoni et al. present an attack payload classifier for anomaly-based intrusion detection systems.

Also, there exists a framework in this paper to add other classifiers and this framework can be extended to hybrid responses (SVM early, RIPPER if sample size over some amount, SVM for high-risk cases…)

Page 29: By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

References. Questions?

[1] J. Bluestein, A. El-Maazawi. “Bloom Filters- A Tutorial, Analysis, and Survey”, Technical Report CS-2002-10. Faculty of Computer Science, Dalhousie Univ., Canada.