cod (cluster onset detection) : online temporal clustering for outbreak detection tomas singliar (u....
TRANSCRIPT
COD (Cluster Onset Detection) : Online Temporal Clustering for Outbreak Detection
Tomas Singliar (U. Pitt.),
Denver H. Dash (Intel Research, U. Pitt.)
AAAI’07 (American Association for AI National Conference)
2009/8/26 Speaker: Li-Ming Chen 2
Reference
When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions Denver H. Dash, etc. AAAI’06
COD: Online Temporal Clustering for Outbreak Detection Tomas Singliar, Denver H. Dash AAAI’07
2009/8/26 Speaker: Li-Ming Chen 3
Challenge: Slowly Propagating Attacks Worm attacks – 2 opposite extremes:
1. Much faster to allow rapid spread !! 2. Much slower to prevent detection !!
Most of the existing detection techniques rely on the fact that worms are reproducing quickly
Slow propagation attacks Difficult to detect – under the veil of normal network traffic Still dangerous – can propagate exponentially
2009/8/26 Speaker: Li-Ming Chen 4
Other Challenges
Global Infection: IDSes (individual entities) can only see a partial pictur
e of the larger network wide behavior of the worm require collaboration detection (AAAI’06)
Homogeneous assumption: Detection techniques treat the population as a monoli
thic entity also note that, hosts or detectors (collaborators) are
not always homogeneous (AAAI’07)
2009/8/26 Speaker: Li-Ming Chen 5
Architecture Model
LD
LD
LD
GD
“Weak” host-basedLocal Detector
Global Detector:• aggregates messages from LDs
• Performs probabilistic inference to determine whether an infection being present or not
Concept of Collaboration Detection: LDs (designed to be weak but general classifiers) may raise false
alarm at a relatively high frequency GD can combine LDs’ weak information to infer the existence of
an attack Where to place the GDs in the network ?
Centralized/Distributed placement
2009/8/26 Speaker: Li-Ming Chen 7
Paper 1
When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions Denver H. Dash, etc. AAAI’06
COD: Online Temporal Clustering for Outbreak Detection Tomas Singliar, Denver H. Dash AAAI’07
2009/8/26 Speaker: Li-Ming Chen 9
About the “Weak” LDs
A binary classifier Normal or abnormal
Detect by heuristic: Counts # of new outgoing c
onnections to unique Dst. addresses and ports
Observation see pic. In slow worm detection, set
threshold to 4 (CPI)
The space of LD: Inward-looking Outward-looking
LD thresholdPre-define as 4 (CPI)
Propagation rate ofprevious worms(Blaster, Slapper, CR2, Slammer, Witty)
within 37 hosts
within 5 weeks, observe 37 hosts,will have (37*5*7*24*60*60/50)= 2,237,760 obs.,then compute distribution…
2009/8/26 Speaker: Li-Ming Chen 10
4 possible GD models
Traditional collaborative counting schemes: PosCount
Tests whether Σ(positive counts) > threshold or not CuSum
Detect changes in the trend of a statistic DBN-based schemes:
CP-DBN A simplified causal model Models an attack as occurring uniformly across the population
or not at all E-DBN
Models the dynamics of a system that is being swept by and epidemic outbreak
2009/8/26 Speaker: Li-Ming Chen 11
How GDs work?
Input of a GD: Lt, a binary subset of LD observations at time t
GD output: St, some measure of how likely a global anomaly is to be occurring at time t
The system of GDs makes up an ensemble !! There are many ensemble techniques could be used This paper only use the max function to determine whethe
r a global alarm should be raised or not
2009/8/26 Speaker: Li-Ming Chen 12
How GDs work? (cont’d)
Traditional collaborative counting schemes: PosCount
Tests whether Σ(positive counts) > threshold or not CuSum
Detect changes in the trend of a statistic DBN-based schemes:
CP-DBN A simplified causal model Models an attack as occurring uniformly across the population
or not at all E-DBN
Models the dynamics of a system that is being swept by and epidemic outbreak
2009/8/26 Speaker: Li-Ming Chen 13
CP-DBNAi = {T, F}, attack has taken place at time i or not.Ol
i = {on, off}, LD l is on or off at time i.
observation time T
LD0
total M LDsTP rate
FP rate
(hidden states)
(observable states)
2009/8/26 Speaker: Li-Ming Chen 14
E-DBN
To model the exponential growing trend:• T denotes observation time• At = {0, 1}, the anomaly state at time t• Nt = {0, …, N}, # of infected hosts• S is the spreading rate• Ot = {0, …, N}, # of observed LDs that fired
state transitionbetween unobserved state variables
(hidden states)
(observable states)
2009/8/26 Speaker: Li-Ming Chen 15
E-DBN (cont’d)
Assuming a worm attack, the growth rate in the number of infected hosts ΔNt+1 is modeled by a binomial:
The likelihood of ot detectors firing when nt hosts are infected is modeled by a binomial:
where
chance of a hitsusceptible
2009/8/26 Speaker: Li-Ming Chen 16
How DBN-based GDs works?
given DBN model
based on some observationsfrom t-T to t
Anomaly Am at the most likely time m
then, do ensemble decision making(using max function)
2009/8/26 Speaker: Li-Ming Chen 17
Performance Evaluation
better
Parameters:• Spread rate S = 1 conn. per 20 sec.• Address density = 1/1000 (ratio of vulnerable hosts)• LD threshold = 4 conn. per 50 sec.• LD comm. with GD per 10 sec.
Desired FP rate
PosCount only raise a detection after the entire network is infected
2009/8/26 Speaker: Li-Ming Chen 18
Paper 2
When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions Denver H. Dash, etc. AAAI’06
COD: Online Temporal Clustering for Outbreak Detection Tomas Singliar, Denver H. Dash AAAI’07
2009/8/26 Speaker: Li-Ming Chen 19
New Approach:COD (Cluster Onset Detection) What to cluster?
Partition the population (e.g., hosts) into subgroups, then COD tries to detect susceptible subgroups
Why clustering? Traditional outbreak detection methods treat the
population as a monolithic entity Real populations are heterogeneous
Different subpopulations are susceptible to different degrees Clustering can boost the signal-to-noise ratio for
detection
2009/8/26 Speaker: Li-Ming Chen 20
COD Model – detection architecture “Weak” host-based LDs
Periodically send their status to a GD Use the same feature and rule:
Fire whenever the number of outgoing connections exceeds 4 in a 50 second interval
Centralized GD Collects messages and determines whether the
positive local detections corroborate each other Periodically outputs a signal that represents its belief
of infection being present
2009/8/26 Speaker: Li-Ming Chen 21
COD Model – data
Dataset X Row: Xi corresponds to a single LD i
Column: X*j corresponds to the value of a feature function in a discrete time interval j
Use temporal stratified sampling Each time interval has a fixed position
Ex. 12am-1am, 1am-2am etc. To account for obvious diurnal behavior in the system
LD i
Time j
Sum of alarms(might be FP)
2009/8/26 Speaker: Li-Ming Chen 22
COD Model – clustering
Naïve Bayes clustering model
In a time interval,a LD may fire several times
F() = sum(alarms)for each machine
NB features are positive local detection counts Xij arriving from a machine i during a time interval j
Assuming different classes generate their detections randomly at different rates and can take a fairly large range of values,Xij can be assumed as Poisson distributed
2009/8/26 Speaker: Li-Ming Chen 23
COD Model – clustering (cont’d) Some details:
How to determine the number m of clusters? By using a greedy heuristic to find optimal value
Not mentioned about λkjx
At the end of each interval, The feature value will be updated and the model is re-learned
How to cluster? The posterior on the cluster variable M defines the assignmen
t of local detectors into clusters:
2009/8/26 Speaker: Li-Ming Chen 24
COD Model, example
A typical example of how the hosts in the dataset get assigned into clusters.• 5 clusters (colors) & 1 day burn-in period• Clusters are rather stable and cluster membership changes rarely• At the end, most hosts have been infected
Time (hr)
host ID
(burn-in)
2009/8/26 Speaker: Li-Ming Chen 25
COD Model, demonstrate daily pattern
Clustering group hosts according to the daily pattern of their local detection activity• 5 groups (two of which are composed of a single host)• reflects the applications and habits of the host and can provide better estimation for deteciton
host ID
Time (hr)
Local detectioncount in a time interval
2009/8/26 Speaker: Li-Ming Chen 26
4-step Cluster Interpretation Detect “highly active” cluster (presumably infected)
Compute “average detection rate” for each host
Compute “average (local) detection rate” for each cluster and identify the most active cluster
Performing a one-sided, unbalanced-design t-test with null hypothesis Host detection rates in the most active cluster and remainder of the
population are the same ! Comparing the outcome of the t-test to a historical histogram of
values to determine if the system is in an anomalous state
num. of positive detections at host i
2009/8/26 Speaker: Li-Ming Chen 27
Experimental Evaluation
Some details in configuration: Normal traffic trace: 5 weeks traces from 37 hosts Inject worm traffic for testing LDs send a message every 10 seconds Focus on metrics: FAR, TTD (FI)
False Alarm Rate, Time To Detect, Fraction of Infection Aim to control FAR to 1 per week
Compare the results with E-DBN (the baseline) Traffic trace will be recycled to simulate more hosts Observe the effects of number of cluster, network size
and interval length
2009/8/26 Speaker: Li-Ming Chen 28
COD vs. E-DBN
COD/adaptive performs betterbut more costly to run!
COD outperforms E-DBN (FI reduce)
AMOC: plot the expected time to detection (since the outbreak began) as a function of the false alarm rate
2009/8/26 Speaker: Li-Ming Chen 29
Scaling with Network SizeThe performance actually improves with scaling of the system Larger number of datapoints gives the model more information and refines the clustering
2009/8/26 Speaker: Li-Ming Chen 30
Effect of Interval Length
Interval length affects the performance in two (opposite) ways:1. More freq. re-clustering eliminates part of the “mid-interval” blind spot2. Longer interval yield features with less variance.
(in a day)
standard deviation
The results show that:• Better Perf. is achieved with longer intervals. (better smoothing over any random fluctuation)• Lower frequency of the detection Algo. Invocationgives fewer false alarms• And for slow worm, delayeddetection is okay!
2009/8/26 Speaker: Li-Ming Chen 31
Conclusion
Use distribution scheme and collaborative inference to support slow worm detection
Dividing the population into subgroups according to susceptibility increase the SNR ratio and can lead to detection performance boost Subgroups are more homogeneous in their usage and
application patterns Not require prior knowledge of the population