department of information engineering university of pisa · department of information engineering...

96
Statistical Anomaly Detection Christian CALLEGARI Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection 8th - 12th February Turin, Italy

Upload: others

Post on 12-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Statistical Anomaly Detection

Christian CALLEGARI

Department of Information EngineeringUniversity of Pisa

PhD Winter SchoolIP Traffic Characterization and Anomaly Detection

8th - 12th FebruaryTurin, Italy

Page 2: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Acknowledgments

C. Callegari Anomaly Detection 2 / 96

Page 3: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 3 / 96

Page 4: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 4 / 96

Page 5: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

A bit of History

The history of IDSs can be split in three main blocks1 First Generation IDSs (end of the 1970s)

The concept of IDS first appears in the 1970s and early1980s (Anderson, Computer Security Monitoring andSurveillance, Tech Rep 1980)Focus on audit data of a single machinePost processing of data

2 Second Generation IDSs (1987)Intrusion Detection Expert System (Denning, An intrusionDetection Model, IEEE Trans. on Soft. Eng., 1987)Statistical analysis of data

3 Third Generation IDSs (to come)Focus on the networkReal-time detectionReal-time reactionIntrusion Prevention System

C. Callegari Anomaly Detection 5 / 96

Page 6: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

IDES

Model’s components

Subjects: initiators of activity on a target system

Objects: resources managed by the system files, commands,etc.

Audit Records: generated by the target system in response toactions performed or attempted by subjects

Profiles: structures that characterize the behavior of subjectswith respect to objects in terms of statistical metrics and modelsof observed activity

Anomaly Records: generated when abnormal behavior isdetected

Activity Rules: actions taken when some condition is satisfied

C. Callegari Anomaly Detection 6 / 96

Page 7: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Subjects and Objects

SubjectsInitiators of actions on the target systemIt is typically a terminal userThey can be grouped into different categories

Users groups may overlap

ObjectsReceptors of subjects’ actionsIf a subject is a recipient of actions (e.g. electronic mail), then isalso considered to be a objectAdditional structures may be imposed (e.g. records may begrouped in database)

Objects granularity depends on the environment

C. Callegari Anomaly Detection 7 / 96

Page 8: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Audit Records

{Subject, Action, Object, Exception-Condition,Resource-Usage, Time-stamp}

Action: operation performed by the subject on or with theobjectException-Condition: denotes which, if any, executioncondition is raised on the returnResource-Usage: list of quantitative elements, whereeach element gives the amount of some resourceTime-stamp: unique time/date stamp identifying when theaction took place

C. Callegari Anomaly Detection 8 / 96

Page 9: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Profiles

An activity profile characterizes the behavior of a givensubject (or set of subjects) with respect to a given object,thereby serving as a signature or description of normalactivity for its respective subject and objectObserved behavior is characterized in terms of a statisticalmetric and modelA metric is a random variable x representing a quantitativemeasure accumulated over a periodObservations xi of x obtained from the audit records areused together with a statistical model to determine whethera new observation is abnormalThe statistical models make no assumptions about theunderlying distribution of x ; all knowledge about x isobtained from the observations xi

C. Callegari Anomaly Detection 9 / 96

Page 10: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Metrics and Models

MetricsEvent counter

Interval timer

Resource measure

Statistical modelsOperational model : abnormality is decided by comparison of xn with a fixedthreshold

Mean and standard deviation model : abnormality is decided by checking if xnfalls inside the confidence interval

Multivariate model : based on the correlations between two or more metrics

Markov process model : based on the transition probabilities

Time series model : takes into account order and inter-arrival time of the

observations

C. Callegari Anomaly Detection 10 / 96

Page 11: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Profile structure

{Variable-name, Action-pattern, Exception-pattern,Resource-usage-pattern, Period, Variable-type, Threshold,

Subject-pattern, Object-pattern, Value}

Variable-nameAction-pattern: pattern that matches one or more actions in theaudit records (e.g. “login”)Exception-pattern: pattern that matches on theException-condition field of an audit recordResource-usage-pattern: pattern that matches on theResource-usage field of an audit recordPeriod: time interval for measurementsVariable-type: name of abstract data type that defines aparticular type of metric and statistical model (e.g. event counterwith mean and standard deviation model)Threshold

C. Callegari Anomaly Detection 11 / 96

Page 12: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Profile structure

{Variable-name, Action-pattern, Exception-pattern,Resource-usage-pattern, Period, Variable-type, Threshold,

Subject-pattern, Object-pattern, Value}

Subject-pattern: pattern that matches on the Subject fieldof an audit recordObject-pattern: pattern that matches on the Object field ofan audit recordValue: value of current observation and parameters usedby the statistical model to represent distribution of previousvalues

There also is the possibility of defining profiles for classes

C. Callegari Anomaly Detection 12 / 96

Page 13: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Profile templates

When user accounts and objects can be created dynamically, amechanism is needed to generate activity profiles for newsubjects and objects

Manual create: the security officer explicitly creates allprofilesAutomatic explicit create: all profiles for a new user aregenerated in response to a “create” record in the audit trailFirst use: a profile is automatically generated when asubject (new or old) first uses an object (new or old)

C. Callegari Anomaly Detection 13 / 96

Page 14: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Anomaly Records

{Event, Time-stamp, Profile}

Event: indicates the event giving rise to the abnormalityand is either “audit”, meaning the data in an audit recordwas found abnormal, or “period”, meaning the dataaccumulated over the current period was found abnormalTime-stamp: either the Time-stamp in the audit trail orinterval stop timeProfile: activity profile with respect to which theabnormality was detected

C. Callegari Anomaly Detection 14 / 96

Page 15: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

Activity Rules

A condition that, when satisfied, causes the rule to be fired, anda body, which specified the action to be taken

Audit-record rule: triggered by a match between a new auditrecord and an activity profile, updates the profiles and checks foranomalous behavior

Periodic-activity-update rule: triggered by the end of aninterval matching the period component of an activity profile,updates the profiles and checks for anomalous behavior

Anomaly-record rule: triggered by the generation of ananomaly record, brings the anomaly to the immediate attentionof the security officer

Periodic-anomaly-analysis rule: triggered by the end of aninterval, generates summary reports of the anomalies during thecurrent period

C. Callegari Anomaly Detection 15 / 96

Page 16: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

IDES

References

D. Denning , An intrusion detection model, IEEETransactions Software Engineering, vol. SE-13, no.2, 1987

C. Callegari Anomaly Detection 16 / 96

Page 17: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Statistical Anomaly Detection

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 17 / 96

Page 18: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Statistical Anomaly Detection

Statistical Approach: Traffic Descriptors

The goal is to identify some traffic parameters, which can beused to describe the network traffic and that vary significantlyfrom the normal behavior to the anomalous one

Some examplesPacket lengthInter-arrival timeFlow sizeNumber of packets per flow. . . and so on

C. Callegari Anomaly Detection 18 / 96

Page 19: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Statistical Anomaly Detection

Choice of the Traffic Descriptors

For each parameter we can considerMean ValueVariance and higher order momentsDistribution functionQuantiles. . . and so on

The number of potential traffic descriptors is huge (somepapers identify up to 200 descriptors)

GOALTo identify as few descriptors as possible to classify traffic with

an acceptable error rate

C. Callegari Anomaly Detection 19 / 96

Page 20: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 20 / 96

Page 21: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT

SnortThe most famous IDSOpen source software toolNetwork basedSignature basedCentralized architecture

Spade, the anomaly detection plug-in for Snort... is notsupported any longer

The rules database, as well as the system code, is available fordownload at the web site http://www.snort.org

C. Callegari Anomaly Detection 21 / 96

Page 22: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT Architecture

C. Callegari Anomaly Detection 22 / 96

Page 23: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT Architecture

Pre-Processor

First security checkStateful approach (e.g. Port Scan)

Post-Processor

Alarm generationIt is possible to choice which action should be done by thesystem

Central Engine

Check for known patterns (pkt level/flow level)Rules are organized in tree structures80% of the total processing time

C. Callegari Anomaly Detection 23 / 96

Page 24: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT Rules

Snort follows a “Unixy” configuration philosophyConfiguration is plaintextPowerful and complexSnort configuration consists of:

Global configuration (snort.conf)Optional *.rules file(s)

C. Callegari Anomaly Detection 24 / 96

Page 25: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT Rules

snort.confvar HOME NET 192.168.3.0/24var EXTERNAL NET !$HOME NETvar DNS SERVERS [192.168.3.1,192.168.3.10]var HTTP SERVERS[192.168.3.1,192.168.3.2,192.168.3.88]var HTTP PORTS 80var RULE PATH /usr/local/snortrules...include $RULE PATH/local.rulesinclude $RULE PATH/bad-traffic.rulesinclude $RULE PATH/attack-responses.rulesinclude $RULE PATH/bleeding-all.rulesInclude $RULE PATH/community-bot.rules

C. Callegari Anomaly Detection 25 / 96

Page 26: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SNORT Rules

alert tcp any any -> any any (msg:‘‘Samplealert’’;)Header contains the following fields:

Action (log, alert)Protocol (ip, tcp, udp, icmp, any)Src IP & PortDst IP & PortDirection operator (“->”, “< > ”)

The body is usually the complex partBegins and ends with “()”Series of “rule options” (keywords, with optionalparameters) separated by “;”

C. Callegari Anomaly Detection 26 / 96

Page 27: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SPADE

SPADE is a pre-processor plug-in for the Snort intrusiondetection engineIn order to detect abnormal, anomalous packets SPADEmaintains probability tables that contain informationregarding the number of occurrences of different kinds ofpacket over time on the networkIt assigns a higher weight to more recent occurrences, andgradually phases out older occurrences

As an example consider DNS packet:the probability tables could tell us that the probability of apacket to the DNS server on port 53 is 10%where as the probability of a packet to the DNS server onport 80 is 0.1%

C. Callegari Anomaly Detection 27 / 96

Page 28: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Snort & Spade

SPADE

This probability is converted first to a “raw anomaly score” andthen into a “relative anomaly score”; this leaves us with anumber between 0 and 1 for easy comparison

Given a probability P(X ) for a packet X the “raw anomalyscore” a(X ) is equal to log2(P(X ))

To get to the “relative anomaly score” A(X ), a(X ) is thendivided by the maximum possible “raw anomaly score”A(X ) being between 0 and 1, with 0 being completely“normal” and 1 being completely “not normal”

When setting up the SPADE sensor we specify an alertingthreshold, so a packet X giving an A(X ) over a certain valuewill trigger an alert

C. Callegari Anomaly Detection 28 / 96

Page 29: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 ClusteringClusteringOutliers Detection

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 29 / 96

Page 30: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Clustering

Clustering is the assignment of a set of observations intosubsets (called clusters) so that observations in the samecluster are similar in some senseClustering is a method of unsupervised learningThe clusters are computed on the basis of a distancemeasure, which will determine how the similarity of twoelements is calculatedCommon distances are:

Euclidean distanceManhattan distanceMahalanobis distance. . .

C. Callegari Anomaly Detection 30 / 96

Page 31: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Clustering

Types of clustering:Hierarchical algorithms find successive clusters usingpreviously established clusters, they can be agglomerative(“bottom-up”) or divisive (“top-down”)Partitional algorithms typically determine all clusters atonce, but can also be used as divisive algorithms in thehierarchical clustering.Density-based algorithms are devised to discoverarbitrary-shaped clusters. In this approach, a cluster isregarded as a region in which the density of data objectsexceeds a thresholdTwo-way clustering, co-clustering or biclustering areclustering methods where not only the objects areclustered but also the features of the objects, i.e., if thedata is represented in a data matrix, the rows and columnsare clustered simultaneously.

C. Callegari Anomaly Detection 31 / 96

Page 32: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Distance

Euclidean distance

The Euclidean distance between points p and q is thelength of the line segment pq.In Cartesian coordinates, if p = (p1,p2, ...,pn) andq = (q1,q2, ...,qn) are two points in Euclidean n-space,then the distance from p to q is given by:

d(p,q) =√

(p1 − q1)2 + (p2 − q2)2 + · · ·+ (pn − qn)2 =√∑ni=1(pi − qi)2

It is not suitable when the point features are not “uniform”

C. Callegari Anomaly Detection 32 / 96

Page 33: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Distance

Manhattan distance

The distance between two points is the sum of the(absolute) differences of their coordinatesd1(p,q) = ‖p− q‖1 =

∑ni=1 |pi − qi |

C. Callegari Anomaly Detection 33 / 96

Page 34: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Distance

Mahalanobis distanceDistance measure introduced by P. C. Mahalanobis in 1936It is based on correlations between variables by whichdifferent patterns can be identified and analyzedIt is a useful way of determining similarity of an unknownsample set to a known oneIt is scale-invariant, i.e., not dependent on the scale ofmeasurementsThe Mahalanobis distance of a multivariate vectorx = (x1, x2, x3, . . . , xN)T from a group of values with meanµ = (µ1, µ2, µ3, . . . , µN)T and covariance matrix S isdefined as:DM(x) =

√(x − µ)T S−1(x − µ)

It can also be defined as a dissimilarity measure betweentwo random vectors ~x and ~y of the same distribution withthe covariance matrix S

C. Callegari Anomaly Detection 34 / 96

Page 35: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

K-Means Algorithm

The k-means algorithm assigns each point to the cluster whosecenter (also called centroid) is the nearest

1 Choose the number of clusters, k2 Randomly generate k clusters and determine the cluster

centers, or directly generate k random points as clustercenters

3 Assign each point to the nearest cluster center4 Recompute the new cluster centers5 Repeat the two previous steps until some convergence

criterion is met (e.g., the assignment hasn’t changed)

C. Callegari Anomaly Detection 35 / 96

Page 36: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

K-Means Algorithm - An example

Consider k = 2, choose 2 points (centroids), build 2 clusters

!"#$%&'($)*+(,-./01.),/,(2&$)*34

567%,.8&-($)*34)9,.'%&)&.)($8$)9-/1-",))))))46:"1/',%&;;&-($)&).$/'%&)8-'&)&.'$%.$)-)<1,/'&)9,.'%&

Figure Reproduced From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

C. Callegari Anomaly Detection 36 / 96

Page 37: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

K-Means Algorithm - An example

Compute the new centroids

!"#$%&'($)*+(,-./01.),/,(2&$)*34

56)7&8-"8$"&-($)&)8,.'%&)9-/-.:$8&)/1&).$/'%&)8"1/',%)8$%%,.'&

Figure Reproduced From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

C. Callegari Anomaly Detection 37 / 96

Page 38: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

K-Means Algorithm - An example

Build the new clusters

!"#$%&'($)*+(,-./01.),/,(2&$)345

67)8&9-::&-($)&"):"1/',%&.#);,&).$/'%&);-'&)&.'$%.$)-&).$/'%&).1$<&):,.'%&)

Figure Reproduced From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

C. Callegari Anomaly Detection 38 / 96

Page 39: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

K-Means Algorithm - An example

Repeat last 2 steps, until a assignments don’t change

!"#$%&'($)*+(,-./01.),/,(2&$)345

67)8&2,'&-($)#"&)1"'&(&)91,)2-//&):&.;<=);&)/$.$)21.'&);<,)

>,.#$.$)($//&)&.)1.)9&::,%,.',);"1/',%

Figure Reproduced From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici

C. Callegari Anomaly Detection 39 / 96

Page 40: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Fuzzy c-means clustering

In fuzzy clustering, each point has a degree of belonging toclusters, as in fuzzy logic, rather than belonging completelyto just one clusterFor each point x we have a coefficient giving the degree ofbeing in the k th cluster uk (x) (∀x (

∑k uk (x) = 1) )

The centroid of a cluster is the mean of all points, weightedby their degree of belonging to the cluster:

centerk =

∑x uk (x)mx∑x uk (x)m

The degree of belonging is related to the inverse of thedistance to the cluster center:

uk (x) =1

d(centerk , x)

C. Callegari Anomaly Detection 40 / 96

Page 41: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Cluster

Fuzzy c-means clustering

then the coefficients are normalized and fuzzyfied with areal parameter m > 1 so that their sum is 1

uk (x) =1∑

j

(d(centerk ,x)d(centerj ,x)

)2/(m−1)

Application to Anomaly DetectionThis clustering technique provides a means to say how muchanomalous a give event is

C. Callegari Anomaly Detection 41 / 96

Page 42: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Outliers Detection

Outliers

In statistics, an outlier is an observation that is numericallydistant from the rest of the dataDetection based on the full dimensional distances betweenthe points as well as the densities of local neighborhoodsThere exist at least two approaches

the anomaly detection model is trained using unlabeleddata that consist of both normal as well as attack trafficthe model is trained using only normal data and a profile ofnormal activity is created

C. Callegari Anomaly Detection 42 / 96

Page 43: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Outliers Detection

Outliers Detection - Method 1

The idea behind the first approach is that anomalous orattack data form a small percentage of the total dataAnomalies and attacks can be detected based on clustersizes

large clusters correspond to normal datathe rest of the data points, which are outliers, correspond toattacks

C. Callegari Anomaly Detection 43 / 96

Page 44: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Outliers Detection

References

L. Portnoy, E. Eskin, S.J. Stolfo , Intrusion detection withunlabeled data using clustering, ACM Workshop onData Mining Applied to Security, 2001S. Ramaswamy, R. Rastogi, K. Shim , Efficientalgorithms for mining outliers from large data sets,ACM SIGMOD International Conference on Managementof Data, 2000K. Sequeira, M. Zaki , ADMIT: Anomaly-based datamining for intrusions, ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,2002V. Barnett, T. Lewis , Outliers in Statistical Data, Wiley,1994

C. Callegari Anomaly Detection 44 / 96

Page 45: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Cluster Outliers Detection

References

C.C. Aggarwal, P.S. Yu , Outlier detection for highdimensional data, ACM SIGMOD InternationalConference on Management of Data, 2001M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander , LOF:identifying density-based local outliers, ACM SIGMODInternational Conference on Management of Data, 2000E.M. Knorr, R.T. Ng , Algorithms for miningdistance-based outliers in large datasets, InternationalConference on Very Large Data Bases, 2008P.C. Mahalanobis , On tests and measures of groupsdivergence, Journal of the Asiatic Society of Bengal 26,1930

C. Callegari Anomaly Detection 45 / 96

Page 46: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian ModelsFirst Order Homogeneous Markov ChainsFirst Order Non Homogeneous Markov ChainsHigh Order Homogeneous Markov Chains

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 46 / 96

Page 47: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Stochastic process

A Stochastic process is a family of random variables{X (t), t ∈ T}, indexed by the parameter t (time), whichassume value in the set SS is the state space of the processThe description of the random process is given by theprobability distribution function

FX(x; t) = P{X (t1) ≤ x1,X (t2) ≤ x2, . . . ,X (tn) ≤ xn}

or by the probability density function

fX(x; t) =δFX(x; t)

δxthe complete stochastic description of the process requiresthe knowledge of one of the two functions for all thepossible values of n and for all the possible (t1, t2, . . . , tn)

C. Callegari Anomaly Detection 47 / 96

Page 48: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Stochastic process

The state space can beContinousDiscrete: the process is also said chain

The parameter t can be:ContinuousDiscrete

C. Callegari Anomaly Detection 48 / 96

Page 49: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Stochastic process

Time-continous chain

X (t)

(a)

(b)

tt1

1 3 5 7 9 ... n

t2 t3 t4 t5 t6 t7 t8 t9t0

Xn

Time-discrete chain

X (t)

(a)

(b)

tt1

1 3 5 7 9 ... n

t2 t3 t4 t5 t6 t7 t8 t9t0

Xn

C. Callegari Anomaly Detection 49 / 96

Page 50: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Markov process

A Markov process, named after the Russian mathematicianAndrey Markov, is a mathematical model for the randomevolution of a memoryless system

Markov property

P(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 ,Ct−3 = si3 , · · · ) =

P(Ct = si0 |Ct−1 = si1)

A Markov chain is a discrete space Markov process and canbe:

discrete-time Markov chain (DTMC)continous-time Markov chain (CTMC)

C. Callegari Anomaly Detection 50 / 96

Page 51: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Markov process

The Markov property implies that the distribution of the sojourntimes in a state must be memoryless

To be noted that the only distribution that can satisfy thememoryless property

P{W > t + τ |W > t} = P{W > τ}

is the exponential distribution

fW (τ) = α · e−ατ

C. Callegari Anomaly Detection 51 / 96

Page 52: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Recall: Markov process

Markov chain representation

State transition diagram

0 1

2/3

2

1/2

1 1/3

1/2

Matrix representationLet P = {pij} be the transition matrix,where pij = P{Ct = j |Ct−1 = i} = P{j |i}and π(n) be the state probability vector atstep n, then

π(n) = π(0) · Pn

P =

0@ 1 0 02/3 0 1/30 1/2 1/2

1A

C. Callegari Anomaly Detection 52 / 96

Page 53: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

State Transition Analysis

The approach was first proposed by Denning anddeveloped in the 1990s.Mainly used in two distinct environment

HIDS: to model the sequence of system commands usedby a userNIDS: to model the sequence of some specific fields of thepacket (e.g. the sequence of the flags values in a TCPconnection)

The most classical approach: Markov chains

C. Callegari Anomaly Detection 53 / 96

Page 54: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models

Markov Chains and TCP

Idea: Model TCP connections by means of Markov chainsThe IP addresses and the TCP port numbers are used toidentify a connectionState space is defined by the possible values of the TCPflagsThe value of the flags is used to identify the chaintransitionsA value Sp is associated to each packet according to therule

Sp = syn + 2 · ack + 4 · psh + 8 · rst + 16 · urg + 32 · fin

C. Callegari Anomaly Detection 54 / 96

Page 55: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Homogeneous Markov Chains

Markov Chain and TCP - Training phase

Calculate the transitionprobabilities

aij = P[qt+1 = j |qt = i] =

P[qt = i ,qt+1 = j]P[qt = i]

Server side3-way handshakepsh flagclosing

SSH Markov Chain

C. Callegari Anomaly Detection 55 / 96

Page 56: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Homogeneous Markov Chains

Markov Chain and TCP - Training phase

Calculate the transitionprobabilities

aij = P[qt+1 = j |qt = i] =

P[qt = i ,qt+1 = j]P[qt = i]

Client side3-way handshakeack flagclosing

FTP Markov Chain

C. Callegari Anomaly Detection 56 / 96

Page 57: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Homogeneous Markov Chains

Markov Chain and TCP - Training phaseCalculate the transition probabilities

aij = P[qt+1 = j |qt = i] =

P[qt = i , qt+1 = j]P[qt = i]

SSH Markov Chain

3-WayHandshake

Syn FloodAttack

C. Callegari Anomaly Detection 57 / 96

Page 58: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Homogeneous Markov Chains

Markov Chain and TCP - Detection phase

Given the observation (S1,S2, · · · ,ST )

The system has to decide between two hypothesis

H0 : normal behaviourH1 : anomaly

(1)

A possible statistic is given by the logarithm of theLikelihood Function

LogLF (t) =T+R∑

t=R+1

Log(aSt St+1)

Or by its temporal “derivative”

Dw (t) =

∣∣∣∣LogLF (t)− 1W

W∑i=1

LogLF (t − i)∣∣∣∣

C. Callegari Anomaly Detection 58 / 96

Page 59: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Homogeneous Markov Chains

Markov Chain and TCP - Detection phase

C. Callegari Anomaly Detection 59 / 96

Page 60: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models First Order Non Homogeneous Markov Chains

Non Homogeneous Markov Chain

First order homogeneous Markov chainP(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 ,Ct−3 = si3 , · · · ) =

P(Ct = si0 |Ct−1 = si1) = P(C0 = si0 |C−1 = si1) =

P(si0 |si1)

First order non-homogeneous Markov chainP(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 ,Ct−3 = si3 , · · · ) =

P(Ct = si0 |Ct−1 = si1) =

Pt(si0 |si1)

We build a distinct Markov Chain for each connection step(first 10 steps)The model should better characterizes the setup and therelease phases

C. Callegari Anomaly Detection 60 / 96

Page 61: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

High order Markov Chain

First order homogeneous Markov chainP(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 ,Ct−3 = si3 , · · · ) =

P(Ct = si0 |Ct−1 = si1) = P(C0 = si0 |C−1 = si1) =

P(si0 |si1)

l th order homogeneous Markov chainP(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 ,Ct−3 = si3 , · · · ) =

P(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 , · · · ,Ct−l = sil ) =

P(C0 = si0 |C−1 = si1 ,C−2 = si2 , · · · ,C−l = sil ) =

P(si0 |si1 , si2 , · · · , sil )

Some connection phases have dependences, betweenpackets, of order bigger than 1

C. Callegari Anomaly Detection 61 / 96

Page 62: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Mixture Transition Distribution

We have an explosion of the number of the chainparameters, which grows exponentially with the order(K l(K − 1))Parsimonious representation of the transition probabilitiesMixture Transition Distribution (MTD) model(K (K − 1) + l − 1)

P(Ct = si0 |Ct−1 = si1 ,Ct−2 = si2 , · · · ,Ct−l = sil ) =lX

j=1

λj r(si0 |sij )

where the quantitiesR = {r(si |sj); i , j = 1, 2, · · · ,K} and Λ = {λj ; j = 1, 2, · · · , l}

satisfy the constraints

r(si |sj) ≥ 0; i , j = 1, 2, · · · ,K andKX

si =1

r(si |sj) = 1 ∀j = 1, 2, · · · ,K

λj ≥ 0; j = 1, 2, · · · , llX

j=1

λj = 1

C. Callegari Anomaly Detection 62 / 96

Page 63: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

State Space Reduction

We only consider the states observed during the trainingphaseWe add a rare state to take into account all the otherpossible statesWe fix the following quantities:

r(rare|si) = ε ∀i = 1,2, · · · ,Kwith ε small (in our case ε = 10−6)

r(si |rare) = (1− ε)/(K − 1)

∀i = 1,2, · · · ,K − 1 (2)

C. Callegari Anomaly Detection 63 / 96

Page 64: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Parameters Estimation

We need to estimate the parameters of the Markov chain(Maximum Likelihood Estimation - MLE)According to the MTD model, the log-likelihood of asequence (c1, c2, · · · , cT ) of length T is:

LL(c1, c2, · · · , cT ) =K∑

i0=1

· · ·K∑

il=1

N(si0 , si1 , · · · , sil )·log( l∑

j=1

λj r(si0 |sij )

)

where N(si0 , si1 , · · · , sil ) represents the number of timesthe transition sil → sil−1 → · · · → si0 is observedWe have to maximize the right hand side of the equation,with respect to R and Λ, taking into account the givenconstraints

C. Callegari Anomaly Detection 64 / 96

Page 65: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Parameters Estimation

Estimation StepsWe apply an alternate maximization with respect to R andto Λ

In the first step (estimation of Λ) we use the sequentialquadratic programmingThe second step (estimation of R) is a linear inverseproblem with positivity constraints (LININPOS) that wesolve applying the Expectation Maximization (EM)algorithm

Global MaximumThis process leads to a global maximum,

since LL is concave in R and Λ.

C. Callegari Anomaly Detection 65 / 96

Page 66: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Markov Chains - Detection Phase

Choose between a single hypothesis H0 (estimatedstochastic model), and the composite hypothesis H1 (allthe other possibilities)

H0 : {(c1, c2, · · · , cT ) ∼ computed model MC0}

H1 : {anomaly}

No optimal result is presented in the literatureThe best solution is represented by the use of theGeneralized Likelihood Ratio (GLR) test:

X =

(Maxv 6=uL(c1, c2, · · · , cT |Λv ,Rv )

L(c1, c2, · · · , cT |Λu,Ru)

)1T H0

≶H1

ξ

C. Callegari Anomaly Detection 66 / 96

Page 67: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Markov Chains - Detection Phase

Equivalent to decide on the basis of the Kullback-Leiblerdivergence between the model associated to H0 (MC0) and theone computed for the observed sequence (MCs)

The Kullback-Leibler divergence, for first order Markov chains, isdefined as:

KL (MC0,MCs) =∑

i

∑j

π0(si )P0(sj |si ) logP0(sj |si )

Ps(sj |si )

where π0(si ) is the stationary distribution of MC0 and Pk (sj |si ) isthe (single step) transition probability from state Ct−1 = si tostate Ct = sj

Extension to Markovian models of order lThe state of the chain Ct has to be considered as a point in a finitel-dimensional lattice:

Ct = (Ct ,Ct−1, . . .,Ct−l+1)

C. Callegari Anomaly Detection 67 / 96

Page 68: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

Non-Homogeneous Markov Chain

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Det

ectio

n R

ate

Non Stationary ECDFNon Homogeneous MCHomogeneous MC

C. Callegari Anomaly Detection 68 / 96

Page 69: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

High Order Markov Chain

C. Callegari Anomaly Detection 69 / 96

Page 70: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

References

N. Ye, Y.Z.C.M Borror, Robustness of the Markov-chainmodel for cyber-attack detection, IEEE Transactions onReliability 53, 2004D.-Y. Yeung, Y. Ding , Host-based intrusion detectionusing dynamic and static behavioral models, PatternRecognition 36, 2003W.-H. Ju and Y. Vardi , A hybrid high-order Markov chainmodel for computer intrusion detection, Tech. Rep. 92,NISS, 1999M. Schonlau, W. DuMouchel, W.-H. Ju, A. Karr, M. Theus,and Y. Vardi , Computer intrusion: Detectingmasquerades, Tech. Rep. 95, NISS, 1999N. Ye, T. Ehiabor, and Y. Zhanget , First-order versushigh-order stochastic models for computer intrusiondetection, Quality and Reliability EngineeringInternational, vol. 18, 2002

C. Callegari Anomaly Detection 70 / 96

Page 71: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Markovian Models High Order Homogeneous Markov Chains

References

A. Raftery , A model for high-order markov chains,Journal of the Royal Statistical Society, series B, vol. 47,1985A. Raftery and S. Tavare , Estimation and modellingrepeated patterns in high-order markov chains with themixture transition distribution (MTD) model, Journal ofthe Royal Statistical Society, series C - Applied Statistics,vol. 43, 1994Y. Vardi and D. Lee , From image deblurring to optimalinvestments: Maximum likelihood solutions forpositive linear inverse problem, Journal of the RoyalStatistical Society, series B, vol. 55, 1993C. Callegari, S. Vaton, and M. Pagano , A new statisticalapproach to network anomaly detection, PerformanceEvaluation of Computer and Telecommunication Systems(SPECTS), 2008

C. Callegari Anomaly Detection 71 / 96

Page 72: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based MethodsEntropyCompression Algorithms

7 Wavelet Analysis

C. Callegari Anomaly Detection 72 / 96

Page 73: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Entropy

Theoretical Background

Entropy

The entropy Entropy H is a measure of the amount of uncertainty.

For an alphabet composed of n distinct symbols

H = −nX

i=1

pi · log2pi bit/symbol

Chaitin-Kolmogorov entropy: the entropy of a string is the length (in bits)of the smallest program which produces as output the string

Starting point

The entropy represents a lower bound to the compression rate that wecan obtain

The presence of anomalies should affect the entropy of the relatedtraffic sequence

C. Callegari Anomaly Detection 73 / 96

Page 74: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Compression Algorithms

Dictionary based algorithms: based on the use of adictionary, which can be static or dynamic, and they codeeach symbol or group of symbols with an element of thedictionary

Lempel-Ziv-Welch (LZW)Model based algorithms: each symbol or group ofsymbols is encoded with a variable length code, accordingto some probability distribution.

Huffman Coding (HC)Dynamic Markov Compression (DMC)

C. Callegari Anomaly Detection 74 / 96

Page 75: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Lempel-Ziv-Welch

Created by Abraham Lempel, Jacob Ziv, and Terry Welch.It was published by Welch in 1984 as an improvedimplementation of the LZ78 algorithm, published byLempel and Ziv in 1978Universal adaptative1 lossless data compression algorithmBuilds a translation table (also called dictionary) from thetext being compressedThe string translation table maps the message strings tofixed-length codes

1The coding scheme used for the k th character of a message is based onthe characteristics of the preceding k − 1 characters in the message

C. Callegari Anomaly Detection 75 / 96

Page 76: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Huffman Coding

Developed by Huffman (1952)Based on the use of a variable-length code table forencoding each source symbolThe variable-length code table is derived from a binary treebuilt from the estimated probability of occurrence for eachpossible value of the source symbolsPrefix-free code2 that expresses the most commoncharacters using shorter strings of bits than are used forless common source symbols

2The bit string representing some particular symbol is never a prefix of thebit string representing any other symbol

C. Callegari Anomaly Detection 76 / 96

Page 77: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Dynamic Markov Compression

Developed by Gordon Cormack and Nigel Horspool (1987)Adaptative lossless data compression algorithmBased on the modelization of the binary source to beencoded by means of a Markov chain, which describes thetransition probabilities between the symbol “0” and thesymbol “1”The built model is used to predict the future bit of amessage. The predicted bit is then coded using arithmeticcoding

C. Callegari Anomaly Detection 77 / 96

Page 78: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

System Design

InputThe system input is given by raw traffic traces in libpcapformatThe 5-tuple is used to identify a connection, while the valueof the TCP flags is used to build the “profile”A value si is associated to each packet:

si = SYN +2 ·ACK +4 ·PSH +8 ·RST +16 ·URG +32 ·FIN

thus each “mono-directional” connection is represented bya sequence of symbols si , which are integers in{0,1, · · · ,63}

C. Callegari Anomaly Detection 78 / 96

Page 79: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

System Design

Training Phase

Choose one of the three previously described algorithms(Huffman, DMC, or LZW)The compression algorithms have been modified so as thatthe “learning phase” is stopped after the training phase:

Huffman case: the occurency frequency of each symbol isestimated only on the training datasetDMC case: the estimation of the Markov chain is onlyupdated during the training phaseLZW case: the construction of the dictionary is stoppedafter the training phase

Detection performed with a compression scheme that is“optimal” for the “normal” traffic used for building theconsidered “profile” and suboptimal for “anomalous” traffic

C. Callegari Anomaly Detection 79 / 96

Page 80: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

System Design

Detection PhaseAppend each distinct “observed” connection b, to thetraining sequence ACompute the “compression rate per symbol”:

X =dim([A|b]∗)− dim([A]∗)

Length(b)

where [X ]∗ represents the compressed version of XChoose between a single hypothesis H0 (normal traffic),and the composite hypothesis H1 (anomaly)

XH0≶H1

ξ

C. Callegari Anomaly Detection 80 / 96

Page 81: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Results - System Comparison

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Det

ectio

n R

ate

False Alarm Rate

HuffmanDMCLZW

C. Callegari Anomaly Detection 81 / 96

Page 82: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

Results - On-line System

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Dete

ctio

n Ra

te

False Alarm Rate

HuffmanDMCLZW

C. Callegari Anomaly Detection 82 / 96

Page 83: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

References

T. Cover and J. Thomas , Elements of informationtheory, Wiley-Interscience, 2nd ed., 2006C. E. Shannon , A mathematical theory ofcommunication, Bell System Technical Journal, vol. 271948D. Huffman , A method for the construction ofminimum-redundancy codes, Proceedings of theInstitute of Radio Engineers, vol. 40, 1952G. Cormack and N. Horspool , Data compression usingdynamic Markov modelling, vol. 30, 1987

C. Callegari Anomaly Detection 83 / 96

Page 84: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Entropy Compression Algorithms

References

J. Ziv and A. Lempel , Compression of individualsequences via variable-rate coding, IEEE Transactionson Information Theory, vol. 24, 1978T. Welch , A technique for high-performance datacompression, IEEE Computer Magazine, vol. 17, no. 6,1984Christian Callegari, Stefano Giordano, Michele Pagano ,On the Use of Compression Algorithms for NetworkAnomaly Detection, IEEE International Conference onCommunications (ICC 2009)

C. Callegari Anomaly Detection 84 / 96

Page 85: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Outline

1 Intrusion Detection Expert System

2 Statistical Anomaly Detection

3 Snort & Spade

4 Clustering

5 Markovian Models

6 Entropy-based Methods

7 Wavelet Analysis

C. Callegari Anomaly Detection 85 / 96

Page 86: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Wavlet Analysis

The wavelets are scaled and translated copies (known as“daughter wavelets”) of a finite-length or fast-decayingoscillating waveform (known as the “mother wavelet”)Wavelet transforms have advantages over traditionalFourier transforms for representing functions that havediscontinuities and sharp peaksThe main difference, with respect to the Fourier transform,is that wavelets are localized in both time and frequencywhereas the standard Fourier transform is only localized infrequency

C. Callegari Anomaly Detection 86 / 96

Page 87: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Wavelet Decomposition

Mother wavelet ψ(t), satisfying the admissibility condition∫|Ψ(ω)|2

|ω|dω < ∞

Wavelet basis

{ψm,n(t)}m,n∈Z ={

a−m/20 ψ

(a−m

0 t − nb0)}

m,n∈Z

Representation of any finite–energy signal x(t) ∈ L2(R) bymeans of its inner products {xm,n}m,n∈Z with the wavelets{ψm,n(t)}m,n∈Z:

xm,n =

∫x(t)·ψm,n(t)dt =

∫x(t)·a−m/2

0 ψ(a−m

0 t − nb0)

dt (3)

Orthonormal dyadic wavelet basisa0 = 2 and b0 = 1Stringent constraints on the mother wavelet

C. Callegari Anomaly Detection 87 / 96

Page 88: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Mother Wavelet

Morlet

C. Callegari Anomaly Detection 88 / 96

Page 89: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Mother Wavelet

Meyer

C. Callegari Anomaly Detection 89 / 96

Page 90: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Mother Wavelet

Mexican Hat

C. Callegari Anomaly Detection 90 / 96

Page 91: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Filter bank implementation of the Wavelet Transform

Two scale difference equation

ψ(t) =√

2∑

n

gnφ(2t − n) φ(t) =√

2∑

n

hnφ(2t − n)

wheregn = (−1)n−1 h−n−1

Let xxx = (x1, x2, . . .) denote the approximation of a finite–energysignal x(t)

g

h 2

2g

h 2

2g

h 2

2x

Level 1 Level 3Level 2C. Callegari Anomaly Detection 91 / 96

Page 92: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Wavelet and Edge Detection

An edge in an image is a contour across which thebrightness of the image changes abruptlyIn image processing,an edge is often interpreted as oneclass of singularitiesIn a function, singularities can be characterized easily asdiscontinuities where the gradient approaches infinityHowever, image data is discrete, so edges in an imageoften are defined as the local maxima of the gradientWavelet transform has been found to be a remarkable toolto analyze the singularities including the edges and todetect them effectively

C. Callegari Anomaly Detection 92 / 96

Page 93: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Wavelet and Edge Detection

C. Callegari Anomaly Detection 93 / 96

Page 94: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Wavelet and Anomaly Detection

The concept of edge can be easily extended to that ofanomaly in network trafficClassical approaches look at the time series of specifickinds of packets inside aggregate trafficThey detect irregular traffic patterns in traffic trace

Wavelet analysis is applied to evaluate the traffic signal filteredonly at certain scales, and a thresholding technique is used todetect changes

C. Callegari Anomaly Detection 94 / 96

Page 95: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

References

P.Barford,J.Kline,D.Plonka,A.Ron , A signal analysis ofnetwork traffic anomalies, ACM SIGCOMMInternetMeasurement Workshop, 2002P. Huang, A. Feldmann, W. Willinger , A non-intrusive,wavelet-based approach to detecting networkperformance problems, ACM SIGCOMM InternetMeasurement Workshop, 2001L. Li, G. Lee , DDos attack detection and wavelets, IEEEICCCN, 2003A. Dainotti, A. Pescape’, and G. Ventre , Wavelet-basedDetection of DoS Attacks, IEEE Globecom, 2006Christian Callegari, Stefano Giordano, and MichelePagano , Application of Wavelet Packet Transform toNetwork Anomaly Detection, nternational Conference onNext Generation Teletraffic and Wired/Wireless AdvancedNetworking (NEW2AN), 2008

C. Callegari Anomaly Detection 95 / 96

Page 96: Department of Information Engineering University of Pisa · Department of Information Engineering University of Pisa PhD Winter School IP Traffic Characterization and Anomaly Detection

Wavelet

Thank You for your attention

C. Callegari Anomaly Detection 96 / 96