el 933 final project presentation combining filtering and statistical methods for anomaly detection...

22
EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e Salamatian Nina Taft

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

EL 933 Final Project Presentation

Combining Filtering and Statistical Methods for Anomaly Detection

Augustin Soule Kav´e Salamatian Nina Taft

Motivation

•Traffic anomalies are a fact of life in computer networks

–Outages, attacks, flash crowds, DOS, alpha events etc…

•Anomaly detection and identification in a timely fashion is challenging

–Operators typically monitor traffic / link by eye using SNMP or IP flows…. Is it effective ????–Characterization, building a model of what constitutes the normal behavior.–Efficient methods for anomaly detection…are they any ???

Overview

Introduction - AIM and general discussion of methodology Approach - What tools are used and why ? Sequence of different analysis for anomaly detection –

Modeling the network Obtaining the residual using filtering analysis of the residual using all the methods and comparison of

the methods in the due course Results (analysis)

Results Conclusion Future work – some ideas about what I like about the paper

Introduction

Aim – Develop an approach for anomaly detection for large scale networks using Traffic Matrix. (TM)

The main idea is to predict the normal behavior(TM approach) of the network, filter the “normal“ traffic the actual traffic matrix that is got using more recent measurement data than those used for prediction.

Examination of anomalies in the residual traffic using the 4 methods proposed in the paper, two of which are new and other two are the ones in use

Approach

What is Traffic Matrix ? How we obtain it here ? (SNMP data) Should Time interval to be considered ???

What kinds of anomalies are in focus and why only those ?

Few quick example of such anomalies..

What is a Kalman filter ? What is it used here for ??

What are the four different methods proposed ?

Sequencing the different analysis procedures

SNMP data -> TM -> It has OD flows -> filter normal to get residual -> Anomaly analysis

What are OD flows ? Its significance here. They are preferred over direct usage of the monitored data

captured at the target granularity level

What is included anomaly analysis ? Methods used for analysis and validation of these methods

Using actual data Using a synthetic anomaly generator.

Continued ..

Using network wide perspective (TM) for volume anomaly detection is justified but the data will be lots so have to be scaled.

By projecting onto a small number of principle components we can filter out normal traffic. The traffic projecting onto the remaining components is analyzed for anomalies.

Modeling the Network Obtain per-link statistics on byte counts (SNMP today) Infer TM that includes all the OD flows (hidden states!) Total traffic on a link = Sum of all OD flows traversing that link

which can be expressed as : Yt = At * Xt + Vt Where Yt = vector of link counts at time t, Xt = OD flows as vectors and Vt

corresponds to the measurement errors and At is the routing matrix To capture the dynamic model of OD flow we need a model that

specifies Xt+1 in terms of Xt. Where Ct is the state trasition matrix Wt is the noise that accounts for

randomness in the fluctuation of flow. For traffic estimation, get total byte count per link and then partition in

based on the number of OD flows traversing that link. Now when an anomaly occurs on the link, then it is possible that the anomaly maybe get distributed across all OD flows on that link and to avoid that we use Ct as a diagonal matrix.

Continued..

The assumptions are that Vt and Wt are uncorrelated. Now the task is that we need to Estimate the (t+1)st

instance of Xt+1 and that is done by using a Kalman filter. The Kalman filter is a robust filter that estimates the

system state process by using a 2 step approach that iterates at time t. using this we get ,

which is the estimate of Xt at time i, where t >= I and hence the estimated are obtained.

Continued ….

The methods used and their details: Method I - based on comparison of the residual traffic to the

threshold. ADV – it triggers an anomaly very fast as the test is verified as soon as the

Kalman filter processes the new observation. DISADV – the test is being performed independently on past results

This creates a High false positive rate i.e it will detect the anomaly based on one observation and which is not the right approach.

Method II - based on comparison of local and global variance on our filtered residual signal.

ADV - Uses Cumulative summation approach which solves the DISADV of the previous method. It is a very powerful method as it is proved that it’s the best estimator when the variance and level change are unknown.

DISADV - It adds some delay for the detection as it takes some observations after the anomaly to estimate the deviation level.

Continued..

Method III - Based on multi-scale and variance ADV - The rational behind multi-scale is that anomalies should appear at

different time scales and hence by monitoring these multiple scales the false positive rate should be reduced that is because the change on one time scale will not trigger an alarm (anomaly detected)

DISADV – Detection lag involved as wavelet analysis is involved

Method IV - Based on multi-scale variance shift In this methods an alarm is triggered if the ratio between the local and global

variance exceeds a threshold This analysis is based on two scales one is the scale at which the global

variance is calculated and other is the scale at which the local variance is calculated.

Again detection lag seen as wavelet analysis of the signal is performed.

Validation of the methods Validation of the methods is done using ROC ( receiver operating

characteristics) curves. These curves are a grahical representation of the tradeoffs between the false

positives and the false negatives of a decision threshold.

These are used since the method comparison can be don’t throughout the decision threshold.

Also this ROC analysis gives the performance of the methods in terms of detection time.

An algorithm is considered to be good is the ROC curve climbs rapidly towards the upper left corner (A very high fraction of the true anomalies with only a few false positives)

Quantification of how quickly the ROC climbs is based on the area, larger the area, better is the algorithm. Each algorithm has a ROC curve and then comparison is done.

Continued.. Validation of these methods also includes testing these methods and

getting the range of operation of each of them and this can be done in two ways.

Method I : Collect live data in the form of packet or flow level trace and then to have this trace “labeled”

Labeling is the procedure of identification of anomalous event with the start and finish time.

ADV of labeling - real world anomalies can be tested with. DISADV of labelling – parameters of the anomaly cannot be changed to know

the threshold of the trace. Method II : Synthetically generate attacks and test the methods.

ADV - Parameters of the attacks can be changed and hence the limits of the method can be identified

DISADV – these attacks are not real world and hence the attacks might not be of much concern.

Continued..

Both these approached will be used in our paper so that we can analyze the methods better.

Lets consider an Abilene network for testing. Has 11 POP’s Data is collected from every POP. This data results in a TM of 121 OD flows.

Synthetic anomaly generator Select either one or a set of OD flows to add anomaly and then add

anomalies on top of the baseline traffic level for those OD flows.

Anomaly is characterized by 4 factors : Volume, duration, number of OD flows, shape function. (TABLE 1)

Continued..

Continued..

Results

Continued..

This was the results with the actual data from the network.

The Abilene network had 27 anomalies and the above graph shows that the basic method performed the best(7% FPR & 100% TPR) that is misses no anomalies.

The second best method cached about 85% of the anomalies with the same FPR.

Wavelet method dint reach a FNR=0 even with a huge threshold.

Continued… Now the results with the Synthetic anomaly generator is as follows:

500 anomalies were generated.

Duration varied between 5mins to 2 hours.

OD flows in consideration were 1-7 per anomaly (randomly chosen)

Basic and the Second method performed the best equivalently.

Second method doesn’t perform very well in actual data whereas in the synthetic generator it performs well.

Reason maybe lie in the properties of the anomalies itself.

Detection Time Comparison

Onset of attacks is rapid on the internet so the methods should be fast and have very less detection time.

Continued..

Second method detected 90% anomalies with no lag and Basic method detected 95% of anomalies with no lag

Wavelet method takes about half hour for detection of anomalies and hence its too slow and also it dint perform well.

It is interesting that the vshift method performs well in the synthetic than the abilene network.

Conclusion and what I feel

Interesting granularity level for anomaly detection namely that of TM. Estimation and prediction was interesting as we used wavelets. The idea of filtering normal traffic to analyze residual traffic for

anomalies was nice. Detection schemes were tested well as it involves use of both actual

data and the synthetic anomaly generator. The anomaly generator got close to depict anomalies like DOS, flash

crowds etc. which helped in the validation and evaluation of the four methods well, but then it was not very effective in depicting the attacks like worms.

The wavelet based method dint do very well because of the detechtion time involved.