cod (cluster onset detection) : online temporal clustering for outbreak detection tomas singliar (u....

COD (Cluster Onset Detection) : Online Temporal Clustering for Outbreak Detection

Tomas Singliar (U. Pitt.),

Denver H. Dash (Intel Research, U. Pitt.)

AAAI’07 (American Association for AI National Conference)

2009/8/26 Speaker: Li-Ming Chen 2

Reference

When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions Denver H. Dash, etc. AAAI’06

COD: Online Temporal Clustering for Outbreak Detection Tomas Singliar, Denver H. Dash AAAI’07


Challenge: Slowly Propagating Attacks Worm attacks – 2 opposite extremes:

1. Much faster to allow rapid spread !! 2. Much slower to prevent detection !!

Most of the existing detection techniques rely on the fact that worms are reproducing quickly

Slow propagation attacks Difficult to detect – under the veil of normal network traffic Still dangerous – can propagate exponentially


Other Challenges

Global Infection: IDSes (individual entities) can only see a partial pictur

e of the larger network wide behavior of the worm require collaboration detection (AAAI’06)

Homogeneous assumption: Detection techniques treat the population as a monoli

thic entity also note that, hosts or detectors (collaborators) are

not always homogeneous (AAAI’07)


Architecture Model

LD

LD

LD

GD

“Weak” host-basedLocal Detector

Global Detector:• aggregates messages from LDs

• Performs probabilistic inference to determine whether an infection being present or not

Concept of Collaboration Detection: LDs (designed to be weak but general classifiers) may raise false

alarm at a relatively high frequency GD can combine LDs’ weak information to infer the existence of

an attack Where to place the GDs in the network ?

Centralized/Distributed placement


Paper 1




Architecture


About the “Weak” LDs

A binary classifier Normal or abnormal

Detect by heuristic: Counts # of new outgoing c

onnections to unique Dst. addresses and ports

Observation see pic. In slow worm detection, set

threshold to 4 (CPI)

The space of LD: Inward-looking Outward-looking

LD thresholdPre-define as 4 (CPI)

Propagation rate ofprevious worms(Blaster, Slapper, CR2, Slammer, Witty)

within 37 hosts

within 5 weeks, observe 37 hosts,will have (37*5*7*24*60*60/50)= 2,237,760 obs.,then compute distribution…


4 possible GD models

Traditional collaborative counting schemes: PosCount

Tests whether Σ(positive counts) > threshold or not CuSum

Detect changes in the trend of a statistic DBN-based schemes:

CP-DBN A simplified causal model Models an attack as occurring uniformly across the population

or not at all E-DBN

Models the dynamics of a system that is being swept by and epidemic outbreak


How GDs work?

Input of a GD: Lt, a binary subset of LD observations at time t

GD output: St, some measure of how likely a global anomaly is to be occurring at time t

The system of GDs makes up an ensemble !! There are many ensemble techniques could be used This paper only use the max function to determine whethe

r a global alarm should be raised or not


How GDs work? (cont’d)

Traditional collaborative counting schemes: PosCount

Tests whether Σ(positive counts) > threshold or not CuSum

Detect changes in the trend of a statistic DBN-based schemes:

CP-DBN A simplified causal model Models an attack as occurring uniformly across the population

or not at all E-DBN

Models the dynamics of a system that is being swept by and epidemic outbreak


CP-DBNAi = {T, F}, attack has taken place at time i or not.Ol

i = {on, off}, LD l is on or off at time i.

observation time T

LD0

total M LDsTP rate

FP rate

(hidden states)

(observable states)


E-DBN

To model the exponential growing trend:• T denotes observation time• At = {0, 1}, the anomaly state at time t• Nt = {0, …, N}, # of infected hosts• S is the spreading rate• Ot = {0, …, N}, # of observed LDs that fired

state transitionbetween unobserved state variables

(hidden states)

(observable states)


E-DBN (cont’d)

Assuming a worm attack, the growth rate in the number of infected hosts ΔNt+1 is modeled by a binomial:

The likelihood of ot detectors firing when nt hosts are infected is modeled by a binomial:

where

chance of a hitsusceptible


How DBN-based GDs works?

given DBN model

based on some observationsfrom t-T to t

Anomaly Am at the most likely time m

then, do ensemble decision making(using max function)


Performance Evaluation

better

Parameters:• Spread rate S = 1 conn. per 20 sec.• Address density = 1/1000 (ratio of vulnerable hosts)• LD threshold = 4 conn. per 50 sec.• LD comm. with GD per 10 sec.

Desired FP rate

PosCount only raise a detection after the entire network is infected


Paper 2




New Approach:COD (Cluster Onset Detection) What to cluster?

Partition the population (e.g., hosts) into subgroups, then COD tries to detect susceptible subgroups

Why clustering? Traditional outbreak detection methods treat the

population as a monolithic entity Real populations are heterogeneous

Different subpopulations are susceptible to different degrees Clustering can boost the signal-to-noise ratio for

detection


COD Model – detection architecture “Weak” host-based LDs

Periodically send their status to a GD Use the same feature and rule:

Fire whenever the number of outgoing connections exceeds 4 in a 50 second interval

Centralized GD Collects messages and determines whether the

positive local detections corroborate each other Periodically outputs a signal that represents its belief

of infection being present


COD Model – data

Dataset X Row: Xi corresponds to a single LD i

Column: X*j corresponds to the value of a feature function in a discrete time interval j

Use temporal stratified sampling Each time interval has a fixed position

Ex. 12am-1am, 1am-2am etc. To account for obvious diurnal behavior in the system

LD i

Time j

Sum of alarms(might be FP)


COD Model – clustering

Naïve Bayes clustering model

In a time interval,a LD may fire several times

F() = sum(alarms)for each machine

NB features are positive local detection counts Xij arriving from a machine i during a time interval j

Assuming different classes generate their detections randomly at different rates and can take a fairly large range of values,Xij can be assumed as Poisson distributed


COD Model – clustering (cont’d) Some details:

How to determine the number m of clusters? By using a greedy heuristic to find optimal value

Not mentioned about λkjx

At the end of each interval, The feature value will be updated and the model is re-learned

How to cluster? The posterior on the cluster variable M defines the assignmen

t of local detectors into clusters:


COD Model, example

A typical example of how the hosts in the dataset get assigned into clusters.• 5 clusters (colors) & 1 day burn-in period• Clusters are rather stable and cluster membership changes rarely• At the end, most hosts have been infected

Time (hr)

host ID

(burn-in)


COD Model, demonstrate daily pattern

Clustering group hosts according to the daily pattern of their local detection activity• 5 groups (two of which are composed of a single host)• reflects the applications and habits of the host and can provide better estimation for deteciton

host ID

Time (hr)

Local detectioncount in a time interval


4-step Cluster Interpretation Detect “highly active” cluster (presumably infected)

Compute “average detection rate” for each host

Compute “average (local) detection rate” for each cluster and identify the most active cluster

Performing a one-sided, unbalanced-design t-test with null hypothesis Host detection rates in the most active cluster and remainder of the

population are the same ! Comparing the outcome of the t-test to a historical histogram of

values to determine if the system is in an anomalous state

num. of positive detections at host i


Experimental Evaluation

Some details in configuration: Normal traffic trace: 5 weeks traces from 37 hosts Inject worm traffic for testing LDs send a message every 10 seconds Focus on metrics: FAR, TTD (FI)

False Alarm Rate, Time To Detect, Fraction of Infection Aim to control FAR to 1 per week

Compare the results with E-DBN (the baseline) Traffic trace will be recycled to simulate more hosts Observe the effects of number of cluster, network size

and interval length


COD vs. E-DBN

COD/adaptive performs betterbut more costly to run!

COD outperforms E-DBN (FI reduce)

AMOC: plot the expected time to detection (since the outbreak began) as a function of the false alarm rate


Scaling with Network SizeThe performance actually improves with scaling of the system Larger number of datapoints gives the model more information and refines the clustering


Effect of Interval Length

Interval length affects the performance in two (opposite) ways:1. More freq. re-clustering eliminates part of the “mid-interval” blind spot2. Longer interval yield features with less variance.

(in a day)

standard deviation

The results show that:• Better Perf. is achieved with longer intervals. (better smoothing over any random fluctuation)• Lower frequency of the detection Algo. Invocationgives fewer false alarms• And for slow worm, delayeddetection is okay!


Conclusion

Use distribution scheme and collaborative inference to support slow worm detection

Dividing the population into subgroups according to susceptibility increase the SNR ratio and can lead to detection performance boost Subgroups are more homogeneous in their usage and

application patterns Not require prior knowledge of the population


My Comments

Other features on a host can reveal diurnal patterns?

Host-based LD can acquire rich information about the attack, but building a host-based distributed detection system is much harder

Clustering is a way to deal with stealthy attacks

cod (cluster onset detection) : online temporal clustering for outbreak detection tomas singliar (u....

Documents

liming chenabout

liming chenchallenge

liming chenpaper

slow worm detection

liming chenhow gds work

existing detection techniques

temporal clustering

cod cluster onset detection