presented by: anupam das cs 568mcc spring 2013 network security

April 20, 2023Department of Computer Science,

UIUC

Presented by: Anupam DasCS 568MCC Spring 2013

Network Security

Anomaly Detection

Some of the contents were taken from the authors of "Anomaly Detection : A Survey ACM Computing Surveys, Vol.

41(3), Article 15, July 2009

http://www-users.cs.umn.edu/~kumar/papers/anomaly-survey.php


UIUC

IntroductionWe are drowning in the deluge of

data that are being collected world-wide, while starving for knowledge at the same time

Anomalous events occur relatively infrequently

However, when they do occur, their consequences can be quite dramatic and quite often in a negative sense

“Mining needle in a haystack. So much hay and so little time”


UIUC

• Anomaly is a pattern in the data that does not conform to the expected behaviour

• Also referred to as outliers, exceptions, peculiarities, surprise, etc.

• Anomalies translate to significant (often critical) real life entities

What are Anomalies?


UIUC

Real World Anomalies

• Credit Card Fraud• Cyber Intrusions• Healthcare Informatics / Medical

diagnostics• Industrial Damage Detection• Image Processing / Video

surveillance • Novel Topic Detection in Text

Mining


UIUC

Simple Example

• N1 and N2 are regions of normal behavior

• Points o1 and o2 are anomalies

• Points in region O3 are anomalies

X

Y

N1

N2

o1

o2

O3


UIUC

Key Challenges

• Defining a representative normal region is challenging• The boundary between normal and outlying behavior is

often not precise• The exact notion of an outlier is different for different

application domains• Availability of labeled data for training/validation• Malicious adversaries• Data might contain noise• Normal behavior keeps evolving


UIUC

Aspects of Anomaly Detection Problem• Nature of input data • Availability of supervision • Type of anomaly: point, contextual, structural • Output of anomaly detection • Evaluation of anomaly detection techniques


UIUC

• Input Data could be– Univariate : single variable– Multivariate: multiple variable

• Nature of attributes– Binary– Categorical– Continuous– Hybrid

categoric

al

continuous

continuous

categoric

al

Tid SrcIP Duration Dest IPNumberof bytes

Internal

1 206.163.37.81 0.10 160.94.179.208 150 No

2 206.163.37.99 0.27 160.94.179.235 208 No

3 160.94.123.45 1.23 160.94.179.221 195 Yes

4 206.163.37.37 112.03 160.94.179.253 199 No

5 206.163.37.41 0.32 160.94.179.244 181 No

binary

Input Data


UIUC

Input Data – Complex Data Types• Relationship among data instances

– Sequential • Temporal

– Spatial– Spatio-temporal– Graph

GGTTCCGCCTTCAGCCCCGCGCCCGCAGGGCCCGCCCCGCGCCGTCGAGAAGGGCCCGCCTGGCGGGCGGGGGGAGGCGGGGCCGCCCGAGCCCAACCGAGTCCGACCAGGTGCCCCCTCTGCTCGGCCTAGACCTGAGCTCATTAGGCGGCAGCGGACAGGCCAAGTAGAACACGCGAAGCGCTGGGCTGCCTGCTGCGACCAGGG


UIUC

Data Labels• Supervised Anomaly Detection

– Labels available for both normal data and anomalies– Similar to rare class mining

• Semi-supervised Anomaly Detection– Labels available only for normal data

• Unsupervised Anomaly Detection– No labels assumed– Based on the assumption that anomalies are very

rare compared to normal data


UIUC

Type of Anomaly• Point Anomalies

• Contextual Anomalies

• Collective Anomalies


UIUC

Point Anomalies

• An individual data instance is anomalous if it deviates significantly from the rest of the data set.

X

Y

N1

N2

o1

o2

O3

Anomaly


UIUC

Contextual Anomalies• An individual data instance is anomalous within a

context• Requires a notion of context• Also referred to as conditional anomalies*

Normal

Anomaly


UIUC

Collective Anomalies• A collection of related data instances is anomalous• Requires a relationship among data instances

– Sequential Data– Spatial Data– Graph Data

• The individual instances within a collective anomaly are not anomalous by themselves

Anomalous Subsequence


UIUC

Output of Anomaly Detection

• Label– Each test instance is given a normal or anomaly label– This is especially true of classification-based

approaches

• Score– Each test instance is assigned an anomaly score

• Allows the output to be ranked• Requires an additional threshold parameter


UIUC

Accuracy is not sufficient metric for evaluation– Example: network traffic data set with 99.9% of normal data and

0.1% of intrusions– Trivial classifier that labels everything with the normal class can

achieve 99.9% accuracy !!!!!Predicted

class Confusion

matrix

NC C NC TN FP Actual

class C FN TP

anomaly class – C

normal class – NC

• Focus on both recall and precision– Recall /Detection (R)= TP/(TP + FN) – Precision (P) = TP/(TP + FP) – False rate (F)=FP/(TN+FP)

Evaluation of Anomaly Detection

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curves for different outlier detection techniques

False alarm rate

Det

ectio

n ra

te

AUC


UIUC

Taxonomy*

Anomaly Detection

Contextual Anomaly Detection

Collective Anomaly Detection

Online Anomaly Detection

Distributed Anomaly Detection

Point Anomaly Detection

Classification Based

Rule Based

Neural Networks Based

SVM Based

Nearest Neighbor Based

Density Based

Distance Based

Statistical

Parametric

Non-parametric

Clustering Based Others

Information Theory Based

Spectral Decomposition Based

Visualization Based

* Outlier Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report TR07-17, University of Minnesota (Under Review)


UIUC

Classification Based Techniques• Main idea: build a classification model for normal (and anomalous)

events based on labelled training data, and use it to classify each new unseen event

• Classification models must be able to handle skewed (imbalanced) class distributions

• Categories:– Supervised classification techniques

• Require knowledge of both normal and anomaly class• Build classifier to distinguish between normal and known anomalies

– Semi-supervised classification techniques• Require knowledge of normal class only!• Use modified classification model to learn the normal behavior and then detect

any deviations from normal behavior as anomalous

Pros and Cons


UIUC

Classification Based Techniques

• Some techniques– Neural network based approaches– Support Vector machines (SVM) based approaches– Bayesian networks based approaches– Rule based techniques– Fuzzy Logic– Genetic Algorithms– Principle Component Analysis


UIUC

Multi-layer NNs• Creating hyper-planes for separating between various classes• Good when dealing with huge data sets and handles noisy data well

• Bad because learning takes a long time

Using Neural Networks

yx yx


UIUC

Neural Networks• INPUT LAYER- X = { x1, x2, …. xn}, where n is the number of

attributes. There are as many nodes as no. of inputs.• HIDDEN LAYER – the number of nodes in the hidden layer and

the number of hidden layers depends on implementation.• OUTPUT LAYER – corresponds to the class attribute. There are

as many nodes as classes.• Back Propagation learns by iteratively processing a set of

training data (samples).


UIUC

x1

x2• How would you classify these points using a linear discriminant function in order to minimize the error rate?

denotes +1

denotes -1

Infinite number of answers!

• Which one is the best?

Support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification

SVM

http://en.wikipedia.org/wiki/Hyperplane

Department of Computer Science, UIUC

The linear discriminant function (classifier) with the maximum margin is the best

Why it is the best? Robust to outliners and thus strong generalization ability

denotes +1denotes -1

Support Vectors are those data points that the margin pushes up against

Linear SVM


UIUC

• Formulation:

x1

x2

denotes +1

denotes -1 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

x+

x+

x-

n

such that

1

1

T

T

b

b

w x

w x

The margin width is:

( )

2 ( )

M

x x n

wx x

w w

2maximize

w

( ) 1Ti iy b w x

• Goal:

Quadratic programming with linear constraints

Linear SVM


UIUC

Taxonomy

Anomaly Detection







Rule Based


SVM Based


Density Based

Distance Based

Statistical

Parametric

Non-parametric




Visualization Based


UIUC

Nearest Neighbor Based Techniques

• Key assumption: normal points have close neighbors while anomalies are located far from other points

• General two-step approach1.Compute neighborhood for each data record2.Analyze the neighborhood to determine whether data record is

anomaly or not

• Categories:– Distance based methods

• Anomalies are data points most distant from other points

– Density based methods• Anomalies are data points in low density regions


UIUC

For each object o, examine the # of other objects in the r-neighborhood of o, where r is a user-specified distance thresholdAn object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the r-neighborhood of o

An object o is a DB(r, π) outlier if

Distance Based Anomaly Detection


UIUC

Compute local densities of particular regions and declare instances in low density regions as potential anomalies

Approach: Local Outlier Factor (LOF)

Density Based Anomaly Detection

p2

p1

In the NN approach, p2 is not considered as outlier, while the LOF approach find both p1 and p2 as outliers

NN approach may consider p3 as outlier, but LOF approach does not

p3

Distance from p3 to nearest neighbor

Distance from p2 to nearest neighbor


UIUC

For each data point o compute the # of points in k-distance:

Compute reachability distance (reachdist) for each data example o with respect to data example o’ as:

Compute local reachability density (lrd) :

LOF is the average of the ratio of local reachability density of o’s k-nearest neighbors and local reachability density of the data record o

Local Outlier Factor (LOF)

Higher the LOF the more likely its an outlier


UIUC

Taxonomy

Anomaly Detection







Rule Based


SVM Based


Density Based

Distance Based

Statistical

Parametric

Non-parametric




Visualization Based


UIUC

Clustering Based Techniques• Key assumption: normal data records belong to large and dense

clusters, while anomalies belong do not belong to any of the clusters or form very small clusters

• Categorization according to labels– Semi-supervised – cluster normal data to create modes of normal behavior. If a

new instance does not belong to any of the clusters or it is not close to any cluster, is anomaly

– Unsupervised – post-processing is needed after a clustering step to determine the size of the clusters and the distance from the clusters is required for the point to be anomaly

• Anomalies detected using clustering based methods can be:– Does not belong to any cluster, – Large distance between the object and its closest cluster – Belongs to a small or sparse cluster


UIUC

• FindCBLOF: Detect outliers in small clusters

– Find clusters, and sort them in decreasing size

– To each data point, assign a cluster-based local outlier factor (CBLOF):

– If obj p belongs to a large cluster, CBLOF = cluster_size X similarity between p and cluster

– If p belongs to a small one, CBLOF = cluster size X similarity betw. p and the closest large cluster

Ex. In the figure, o is outlier since its closest large cluster is C1, but the

similarity between o and C1 is small. For any point in C3, its closest

large cluster is C2 but its similarity from C2 is low, plus |C3| = 3 is small

Cluster Based Local Outlier Factor


UIUC

Anomaly Detection







Rule Based


SVM Based


Density Based

Distance Based

Statistical

Parametric

Non-parametric




Visualization Based

Taxonomy


UIUC

Statistics Based Techniques• Statistical approaches assume that the objects in a data set are

generated by a stochastic process (a generative model)• Idea: learn a generative model fitting the given data set, and

then identify the objects in low probability regions of the model as outliers.

• Advantage

– Utilize existing statistical modelling techniques to model various type of distributions

• Challenges

– With high dimensions, difficult to estimate distributions

– Parametric assumptions often do not hold for real data sets


UIUC

Types of Statistical Techniques• Parametric Techniques

– Assume that the normal data is generated from an underlying parametric distribution

– Learn the parameters from the normal sample

– Determine the likelihood of a test instance to be generated from this distribution to detect anomalies

• Non-parametric Techniques

– Do not assume any knowledge of parameters– Not completely parameter free but consider the number and

nature of the parameters are flexible and not fixed in advance– Examples: histogram and kernel density estimation


UIUC

• Univariate data: A data set involving only one attribute or variable

• Often assume that data are generated from a normal distribution, learn the parameters from the input data, and identify the points with low probability as outliers

• Ex: Avg. temp.: {24.0, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4}

– Compute μ and σ from the samples

Parametric Techniques


UIUC

• Univariate outlier detection: The Grubb's test (another statistical method under normal distribution)

• For each object x in a data set, compute its z-score:

• Now x is an outlier if

where is the value taken by a t-distribution at a significance level of

α/(2N), and N is the # of objects in the data set

Parametric Techniques


UIUC

The model of normal data is learned from the input data without any a priori

structure.

Outlier detection using histogram:

Figure shows the histogram of purchase amounts in transactions

A transaction in the amount of $7,500 is an outlier, since only 0.2%

transactions have an amount higher than $5,000

Problem: Hard to choose an appropriate bin size for histogram

Solution: Adopt kernel density estimation to estimate the probability density

distribution of the data.

Non-Parametric Techniques


UIUC

Anomaly Detection on Real Network Data

• Anomaly detection was used at U of Minnesota and Army Research Lab to detect various intrusive/suspicious activities

• Many of these could not be detected using widely used intrusion detection tools like SNORT

• Anomalies/attacks picked by MINDS– Scanning activities– Non-standard behavior

• Policy violations• Worms

MINDS – Minnesota Intrusion Detection System

MINDS

network

Data capturing device

Anomaly detection

……

Anomaly scores

Humananalyst

Detected novel attacks

Summary and characterization

of attacks

Known attack detection

Detected known attacks

Labels

Feature Extraction

Association pattern analysis

MINDSAT

Filtering

Net flow tools

tcpdump


UIUC

• Three groups of features–Basic features of individual TCP connections

• source & destination IP Features 1 & 2• source & destination port Features 3 & 4• Protocol Feature 5• Duration Feature 6• Bytes per packets Feature 7• number of bytes Feature 8

–Time based features• For the same source (destination) IP address, number of unique destination (source) IP

addresses inside the network in last T seconds – Features 9 (13)• Number of connections from source (destination) IP to the same destination (source) port in

last T seconds – Features 11 (15)–Connection based features

• For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14)

• Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16)

flagdst … service …h1 http S0h1 http S0h1 http S0

h2 http S0

h4 http S0

h2 ftp S0

syn flood

normal

existing f eatures existing f eatures uselessuseless


h2 http S0

h4 http S0

h2 ftp S0

syn flood

normal


h2 http S0

h4 http S0

h2 ftp S0

syn flood

normal

existing f eatures existing f eatures uselessuseless

dst … service …h1 http S0h1 http S0h1 http S0

h2 http S0

h4 http S0

h2 ftp S0

flag %S0707275

0

0

0

construct f eatures with construct f eatures with high information gainhigh information gain


h2 http S0

h4 http S0

h2 ftp S0

flag %S0707275

0

0

0


h2 http S0

h4 http S0

h2 ftp S0

flag %S0707275

0

0

0

construct f eatures with construct f eatures with high information gainhigh information gain

Feature Extraction


UIUC

Typical Anomaly Detection Output– 48 hours after the “slammer” worm

score srcIP sPort dstIP dPort protocolflagspackets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1637674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 026676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 024323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 021169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 017679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 08183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.58 0 0 0 0 07142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 05139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 04048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 04008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 03327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0


UIUC

• Anomaly detection can detect critical information in data

• Highly applicable in various application domains• Nature of anomaly detection problem is

dependent on the application domain• Need different approaches to solve a particular

problem formulation

Conclusions


UIUC

Related problems• Rare Class Mining

• Chance discovery

• Novelty Detection

• Exception Mining

• Noise Removal

• Black Swan*

presented by: anupam das cs 568mcc spring 2013 network security

Documents

deluge of data

authors of anomaly detection

representative normal

region o3

noisenormal behavior

outlying behavior

single variablemultivariate

challengingthe boundary