presented by: anupam das cs 568mcc spring 2013 network security
DESCRIPTION
Anomaly Detection. Presented by: Anupam Das CS 568MCC Spring 2013 Network Security. Some of the contents were taken from the authors of " Anomaly Detection : A Survey ACM Computing Surveys, Vol. 41(3), Article 15, July 2009. “Mining needle in a haystack. So much hay and so little time”. - PowerPoint PPT PresentationTRANSCRIPT
April 20, 2023Department of Computer Science,
UIUC
Presented by: Anupam DasCS 568MCC Spring 2013
Network Security
Anomaly Detection
Some of the contents were taken from the authors of "Anomaly Detection : A Survey ACM Computing Surveys, Vol.
41(3), Article 15, July 2009
April 20, 2023Department of Computer Science,
UIUC
IntroductionWe are drowning in the deluge of
data that are being collected world-wide, while starving for knowledge at the same time
Anomalous events occur relatively infrequently
However, when they do occur, their consequences can be quite dramatic and quite often in a negative sense
“Mining needle in a haystack. So much hay and so little time”
April 20, 2023Department of Computer Science,
UIUC
• Anomaly is a pattern in the data that does not conform to the expected behaviour
• Also referred to as outliers, exceptions, peculiarities, surprise, etc.
• Anomalies translate to significant (often critical) real life entities
What are Anomalies?
April 20, 2023Department of Computer Science,
UIUC
Real World Anomalies
• Credit Card Fraud• Cyber Intrusions• Healthcare Informatics / Medical
diagnostics• Industrial Damage Detection• Image Processing / Video
surveillance • Novel Topic Detection in Text
Mining
April 20, 2023Department of Computer Science,
UIUC
Simple Example
• N1 and N2 are regions of normal behavior
• Points o1 and o2 are anomalies
• Points in region O3 are anomalies
X
Y
N1
N2
o1
o2
O3
April 20, 2023Department of Computer Science,
UIUC
Key Challenges
• Defining a representative normal region is challenging• The boundary between normal and outlying behavior is
often not precise• The exact notion of an outlier is different for different
application domains• Availability of labeled data for training/validation• Malicious adversaries• Data might contain noise• Normal behavior keeps evolving
April 20, 2023Department of Computer Science,
UIUC
Aspects of Anomaly Detection Problem• Nature of input data • Availability of supervision • Type of anomaly: point, contextual, structural • Output of anomaly detection • Evaluation of anomaly detection techniques
April 20, 2023Department of Computer Science,
UIUC
• Input Data could be– Univariate : single variable– Multivariate: multiple variable
• Nature of attributes– Binary– Categorical– Continuous– Hybrid
categoric
al
continuous
continuous
categoric
al
Tid SrcIP Duration Dest IPNumberof bytes
Internal
1 206.163.37.81 0.10 160.94.179.208 150 No
2 206.163.37.99 0.27 160.94.179.235 208 No
3 160.94.123.45 1.23 160.94.179.221 195 Yes
4 206.163.37.37 112.03 160.94.179.253 199 No
5 206.163.37.41 0.32 160.94.179.244 181 No
binary
Input Data
April 20, 2023Department of Computer Science,
UIUC
Input Data – Complex Data Types• Relationship among data instances
– Sequential • Temporal
– Spatial– Spatio-temporal– Graph
GGTTCCGCCTTCAGCCCCGCGCCCGCAGGGCCCGCCCCGCGCCGTCGAGAAGGGCCCGCCTGGCGGGCGGGGGGAGGCGGGGCCGCCCGAGCCCAACCGAGTCCGACCAGGTGCCCCCTCTGCTCGGCCTAGACCTGAGCTCATTAGGCGGCAGCGGACAGGCCAAGTAGAACACGCGAAGCGCTGGGCTGCCTGCTGCGACCAGGG
April 20, 2023Department of Computer Science,
UIUC
Data Labels• Supervised Anomaly Detection
– Labels available for both normal data and anomalies– Similar to rare class mining
• Semi-supervised Anomaly Detection– Labels available only for normal data
• Unsupervised Anomaly Detection– No labels assumed– Based on the assumption that anomalies are very
rare compared to normal data
April 20, 2023Department of Computer Science,
UIUC
Type of Anomaly• Point Anomalies
• Contextual Anomalies
• Collective Anomalies
April 20, 2023Department of Computer Science,
UIUC
Point Anomalies
• An individual data instance is anomalous if it deviates significantly from the rest of the data set.
X
Y
N1
N2
o1
o2
O3
Anomaly
April 20, 2023Department of Computer Science,
UIUC
Contextual Anomalies• An individual data instance is anomalous within a
context• Requires a notion of context• Also referred to as conditional anomalies*
Normal
Anomaly
April 20, 2023Department of Computer Science,
UIUC
Collective Anomalies• A collection of related data instances is anomalous• Requires a relationship among data instances
– Sequential Data– Spatial Data– Graph Data
• The individual instances within a collective anomaly are not anomalous by themselves
Anomalous Subsequence
April 20, 2023Department of Computer Science,
UIUC
Output of Anomaly Detection
• Label– Each test instance is given a normal or anomaly label– This is especially true of classification-based
approaches
• Score– Each test instance is assigned an anomaly score
• Allows the output to be ranked• Requires an additional threshold parameter
April 20, 2023Department of Computer Science,
UIUC
Accuracy is not sufficient metric for evaluation– Example: network traffic data set with 99.9% of normal data and
0.1% of intrusions– Trivial classifier that labels everything with the normal class can
achieve 99.9% accuracy !!!!!Predicted
class Confusion
matrix
NC C NC TN FP Actual
class C FN TP
anomaly class – C
normal class – NC
• Focus on both recall and precision– Recall /Detection (R)= TP/(TP + FN) – Precision (P) = TP/(TP + FP) – False rate (F)=FP/(TN+FP)
Evaluation of Anomaly Detection
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curves for different outlier detection techniques
False alarm rate
Det
ectio
n ra
te
AUC
April 20, 2023Department of Computer Science,
UIUC
Taxonomy*
Anomaly Detection
Contextual Anomaly Detection
Collective Anomaly Detection
Online Anomaly Detection
Distributed Anomaly Detection
Point Anomaly Detection
Classification Based
Rule Based
Neural Networks Based
SVM Based
Nearest Neighbor Based
Density Based
Distance Based
Statistical
Parametric
Non-parametric
Clustering Based Others
Information Theory Based
Spectral Decomposition Based
Visualization Based
* Outlier Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report TR07-17, University of Minnesota (Under Review)
April 20, 2023Department of Computer Science,
UIUC
Classification Based Techniques• Main idea: build a classification model for normal (and anomalous)
events based on labelled training data, and use it to classify each new unseen event
• Classification models must be able to handle skewed (imbalanced) class distributions
• Categories:– Supervised classification techniques
• Require knowledge of both normal and anomaly class• Build classifier to distinguish between normal and known anomalies
– Semi-supervised classification techniques• Require knowledge of normal class only!• Use modified classification model to learn the normal behavior and then detect
any deviations from normal behavior as anomalous
Pros and Cons
April 20, 2023Department of Computer Science,
UIUC
Classification Based Techniques
• Some techniques– Neural network based approaches– Support Vector machines (SVM) based approaches– Bayesian networks based approaches– Rule based techniques– Fuzzy Logic– Genetic Algorithms– Principle Component Analysis
April 20, 2023Department of Computer Science,
UIUC
Multi-layer NNs• Creating hyper-planes for separating between various classes• Good when dealing with huge data sets and handles noisy data well
• Bad because learning takes a long time
Using Neural Networks
yx yx
April 20, 2023Department of Computer Science,
UIUC
Neural Networks• INPUT LAYER- X = { x1, x2, …. xn}, where n is the number of
attributes. There are as many nodes as no. of inputs.• HIDDEN LAYER – the number of nodes in the hidden layer and
the number of hidden layers depends on implementation.• OUTPUT LAYER – corresponds to the class attribute. There are
as many nodes as classes.• Back Propagation learns by iteratively processing a set of
training data (samples).
April 20, 2023Department of Computer Science,
UIUC
x1
x2• How would you classify these points using a linear discriminant function in order to minimize the error rate?
denotes +1
denotes -1
Infinite number of answers!
• Which one is the best?
Support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification
SVM
Department of Computer Science, UIUC
The linear discriminant function (classifier) with the maximum margin is the best
Why it is the best? Robust to outliners and thus strong generalization ability
denotes +1denotes -1
Support Vectors are those data points that the margin pushes up against
Linear SVM
April 20, 2023Department of Computer Science,
UIUC
• Formulation:
x1
x2
denotes +1
denotes -1 Margin
wT x + b = 0
wT x + b = -1w
T x + b = 1
x+
x+
x-
n
such that
1
1
T
T
b
b
w x
w x
The margin width is:
( )
2 ( )
M
x x n
wx x
w w
2maximize
w
( ) 1Ti iy b w x
• Goal:
Quadratic programming with linear constraints
Linear SVM
April 20, 2023Department of Computer Science,
UIUC
Taxonomy
Anomaly Detection
Contextual Anomaly Detection
Collective Anomaly Detection
Online Anomaly Detection
Distributed Anomaly Detection
Point Anomaly Detection
Classification Based
Rule Based
Neural Networks Based
SVM Based
Nearest Neighbor Based
Density Based
Distance Based
Statistical
Parametric
Non-parametric
Clustering Based Others
Information Theory Based
Spectral Decomposition Based
Visualization Based
April 20, 2023Department of Computer Science,
UIUC
Nearest Neighbor Based Techniques
• Key assumption: normal points have close neighbors while anomalies are located far from other points
• General two-step approach1.Compute neighborhood for each data record2.Analyze the neighborhood to determine whether data record is
anomaly or not
• Categories:– Distance based methods
• Anomalies are data points most distant from other points
– Density based methods• Anomalies are data points in low density regions
April 20, 2023Department of Computer Science,
UIUC
For each object o, examine the # of other objects in the r-neighborhood of o, where r is a user-specified distance thresholdAn object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the r-neighborhood of o
An object o is a DB(r, π) outlier if
Distance Based Anomaly Detection
April 20, 2023Department of Computer Science,
UIUC
Compute local densities of particular regions and declare instances in low density regions as potential anomalies
Approach: Local Outlier Factor (LOF)
Density Based Anomaly Detection
p2
p1
In the NN approach, p2 is not considered as outlier, while the LOF approach find both p1 and p2 as outliers
NN approach may consider p3 as outlier, but LOF approach does not
p3
Distance from p3 to nearest neighbor
Distance from p2 to nearest neighbor
April 20, 2023Department of Computer Science,
UIUC
For each data point o compute the # of points in k-distance:
Compute reachability distance (reachdist) for each data example o with respect to data example o’ as:
Compute local reachability density (lrd) :
LOF is the average of the ratio of local reachability density of o’s k-nearest neighbors and local reachability density of the data record o
Local Outlier Factor (LOF)
Higher the LOF the more likely its an outlier
April 20, 2023Department of Computer Science,
UIUC
Taxonomy
Anomaly Detection
Contextual Anomaly Detection
Collective Anomaly Detection
Online Anomaly Detection
Distributed Anomaly Detection
Point Anomaly Detection
Classification Based
Rule Based
Neural Networks Based
SVM Based
Nearest Neighbor Based
Density Based
Distance Based
Statistical
Parametric
Non-parametric
Clustering Based Others
Information Theory Based
Spectral Decomposition Based
Visualization Based
April 20, 2023Department of Computer Science,
UIUC
Clustering Based Techniques• Key assumption: normal data records belong to large and dense
clusters, while anomalies belong do not belong to any of the clusters or form very small clusters
• Categorization according to labels– Semi-supervised – cluster normal data to create modes of normal behavior. If a
new instance does not belong to any of the clusters or it is not close to any cluster, is anomaly
– Unsupervised – post-processing is needed after a clustering step to determine the size of the clusters and the distance from the clusters is required for the point to be anomaly
• Anomalies detected using clustering based methods can be:– Does not belong to any cluster, – Large distance between the object and its closest cluster – Belongs to a small or sparse cluster
April 20, 2023Department of Computer Science,
UIUC
• FindCBLOF: Detect outliers in small clusters
– Find clusters, and sort them in decreasing size
– To each data point, assign a cluster-based local outlier factor (CBLOF):
– If obj p belongs to a large cluster, CBLOF = cluster_size X similarity between p and cluster
– If p belongs to a small one, CBLOF = cluster size X similarity betw. p and the closest large cluster
Ex. In the figure, o is outlier since its closest large cluster is C1, but the
similarity between o and C1 is small. For any point in C3, its closest
large cluster is C2 but its similarity from C2 is low, plus |C3| = 3 is small
Cluster Based Local Outlier Factor
April 20, 2023Department of Computer Science,
UIUC
Anomaly Detection
Contextual Anomaly Detection
Collective Anomaly Detection
Online Anomaly Detection
Distributed Anomaly Detection
Point Anomaly Detection
Classification Based
Rule Based
Neural Networks Based
SVM Based
Nearest Neighbor Based
Density Based
Distance Based
Statistical
Parametric
Non-parametric
Clustering Based Others
Information Theory Based
Spectral Decomposition Based
Visualization Based
Taxonomy
April 20, 2023Department of Computer Science,
UIUC
Statistics Based Techniques• Statistical approaches assume that the objects in a data set are
generated by a stochastic process (a generative model)• Idea: learn a generative model fitting the given data set, and
then identify the objects in low probability regions of the model as outliers.
• Advantage
– Utilize existing statistical modelling techniques to model various type of distributions
• Challenges
– With high dimensions, difficult to estimate distributions
– Parametric assumptions often do not hold for real data sets
April 20, 2023Department of Computer Science,
UIUC
Types of Statistical Techniques• Parametric Techniques
– Assume that the normal data is generated from an underlying parametric distribution
– Learn the parameters from the normal sample
– Determine the likelihood of a test instance to be generated from this distribution to detect anomalies
• Non-parametric Techniques
– Do not assume any knowledge of parameters– Not completely parameter free but consider the number and
nature of the parameters are flexible and not fixed in advance– Examples: histogram and kernel density estimation
April 20, 2023Department of Computer Science,
UIUC
• Univariate data: A data set involving only one attribute or variable
• Often assume that data are generated from a normal distribution, learn the parameters from the input data, and identify the points with low probability as outliers
• Ex: Avg. temp.: {24.0, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4}
– Compute μ and σ from the samples
Parametric Techniques
April 20, 2023Department of Computer Science,
UIUC
• Univariate outlier detection: The Grubb's test (another statistical method under normal distribution)
• For each object x in a data set, compute its z-score:
• Now x is an outlier if
where is the value taken by a t-distribution at a significance level of
α/(2N), and N is the # of objects in the data set
Parametric Techniques
April 20, 2023Department of Computer Science,
UIUC
The model of normal data is learned from the input data without any a priori
structure.
Outlier detection using histogram:
Figure shows the histogram of purchase amounts in transactions
A transaction in the amount of $7,500 is an outlier, since only 0.2%
transactions have an amount higher than $5,000
Problem: Hard to choose an appropriate bin size for histogram
Solution: Adopt kernel density estimation to estimate the probability density
distribution of the data.
Non-Parametric Techniques
April 20, 2023Department of Computer Science,
UIUC
Anomaly Detection on Real Network Data
• Anomaly detection was used at U of Minnesota and Army Research Lab to detect various intrusive/suspicious activities
• Many of these could not be detected using widely used intrusion detection tools like SNORT
• Anomalies/attacks picked by MINDS– Scanning activities– Non-standard behavior
• Policy violations• Worms
MINDS – Minnesota Intrusion Detection System
MINDS
network
Data capturing device
Anomaly detection
……
Anomaly scores
Humananalyst
Detected novel attacks
Summary and characterization
of attacks
Known attack detection
Detected known attacks
Labels
Feature Extraction
Association pattern analysis
MINDSAT
Filtering
Net flow tools
tcpdump
April 20, 2023Department of Computer Science,
UIUC
• Three groups of features–Basic features of individual TCP connections
• source & destination IP Features 1 & 2• source & destination port Features 3 & 4• Protocol Feature 5• Duration Feature 6• Bytes per packets Feature 7• number of bytes Feature 8
–Time based features• For the same source (destination) IP address, number of unique destination (source) IP
addresses inside the network in last T seconds – Features 9 (13)• Number of connections from source (destination) IP to the same destination (source) port in
last T seconds – Features 11 (15)–Connection based features
• For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14)
• Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16)
flagdst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
syn flood
normal
existing f eatures existing f eatures uselessuseless
flagdst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
syn flood
normal
flagdst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
syn flood
normal
existing f eatures existing f eatures uselessuseless
dst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
flag %S0707275
0
0
0
construct f eatures with construct f eatures with high information gainhigh information gain
dst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
flag %S0707275
0
0
0
dst … service …h1 http S0h1 http S0h1 http S0
h2 http S0
h4 http S0
h2 ftp S0
flag %S0707275
0
0
0
construct f eatures with construct f eatures with high information gainhigh information gain
Feature Extraction
April 20, 2023Department of Computer Science,
UIUC
Typical Anomaly Detection Output– 48 hours after the “slammer” worm
score srcIP sPort dstIP dPort protocolflagspackets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1637674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 026676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 024323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 021169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 017679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 08183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.58 0 0 0 0 07142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 05139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 04048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 04008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 03327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0
April 20, 2023Department of Computer Science,
UIUC
• Anomaly detection can detect critical information in data
• Highly applicable in various application domains• Nature of anomaly detection problem is
dependent on the application domain• Need different approaches to solve a particular
problem formulation
Conclusions
April 20, 2023Department of Computer Science,
UIUC
Related problems• Rare Class Mining
• Chance discovery
• Novelty Detection
• Exception Mining
• Noise Removal
• Black Swan*