using fuzzy k-modes to analyze patterns of system calls for intrusion detection
DESCRIPTION
Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection. A Master’s Thesis by Michael M. Groat Advisor: Dr. Hilary Holz Thesis Committee: Dr. Eric Suess, and Dr. William Nico. Overview. Computer Security Intrusion Detection Systems based on process traces - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/1.jpg)
Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection
A Master’s Thesis by Michael M. Groat
Advisor: Dr. Hilary HolzThesis Committee: Dr. Eric Suess,
and Dr. William Nico
![Page 2: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/2.jpg)
2
Overview
• Computer Security• Intrusion Detection Systems based on process
traces• Background discussion• Fuzzy k-modes• Our process data model• Comparing new process traces• Experiments and Results• Conclusion
![Page 3: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/3.jpg)
Computer Security 3
Is Your Computer Safe?
• Somewhere someone is trying to break in to your system.
• Hackers are prevalent
![Page 4: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/4.jpg)
Computer Security 4
Computer Security
• Need to prevent intrusions
• Protect data and information
• Secure Privacy
![Page 5: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/5.jpg)
Computer Security 5
Intrusion Detection Systems (IDS)
• Attempt to detect viruses, worms, Trojan horses or other hacking attempts
• Two Types of IDSMisuse basedAnomaly based
![Page 6: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/6.jpg)
Computer Security 6
Immune System: The Body’s Intrusion Detection System
• Protects the body from invasion
• Determines what is not a part of itself
• Removes foreign material
![Page 7: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/7.jpg)
Computer Security 7
Immunocomputing: A Computer’s Security Force
• Protects the computer from intrusions
• Determines, like the natural immune system, what is not itself.
![Page 8: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/8.jpg)
8
Overview
• Computer Security
• Intrusion Detection Systems based on process traces
• Background discussion• Fuzzy k-modes• Our process data model• Comparing new process traces• Experiments and Results• Conclusion
![Page 9: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/9.jpg)
Intrusion detection systems based on process traces
9
How Do You Model “Self” in a Computer?
• We build a sense of self with patterns of system calls
• A certain pattern of system calls define normal behavior
• A program is defined by the pattern of system calls it emits
![Page 10: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/10.jpg)
Intrusion detection systems based on process traces
10
Sense of Self => Anomaly Based Intrusion Detection System
• One that analyzes patterns of system calls or process traces
• We determine the normal patterns and look for deviations from the normal patterns
![Page 11: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/11.jpg)
Intrusion detection systems based on process traces
11
Deviations from Normal Behavior
• In the state space of all possible sequences of system calls we plot normal and intrusion traces
• We attempt to determine if new traces fall in the yellow
![Page 12: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/12.jpg)
Intrusion detection systems based on process traces
12
Five Step to Determine the “Yellow” Behavior
• Intrusion Detection Systems based on analyzing process traces We execute the following 5 steps
![Page 13: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/13.jpg)
Intrusion detection systems based on process traces
13
Step One: Record the System Calls
• Special programs such as strace
• Collects process ids and system call numbers
• System call numbers are found by their order in syscall.h file
2032 32
2032 23
2033 54
2033 2
2043 3
2033 63
2032 34
2032 33
2043 23
2032 2
2033 4
2033 5
![Page 14: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/14.jpg)
Intrusion detection systems based on process traces
14
Step 2: Convert the Data to the Training Data
• List of process Ids and system calls are converted to n length strings
• n is 6, 10, or 14• Take a sliding window
across the data
n = 3
32 23 34
23 34 33
54 2 63
2 63 4
63 4 5
34 33 2
![Page 15: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/15.jpg)
Intrusion detection systems based on process traces
15
Step 2 – Further Explained
2032 32
2032 23
2033 54
2033 2
2043 3
2033 63
2032 34
2032 33
2043 23
2032 2
2033 4
2033 5
32 23 34
![Page 16: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/16.jpg)
Intrusion detection systems based on process traces
16
Step 2 – Further Explained
2032 32
2032 23
2033 54
2033 2
2043 3
2033 63
2032 34
2032 33
2043 23
2032 2
2033 4
2033 5
32 23 34
23 34 33
![Page 17: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/17.jpg)
Intrusion detection systems based on process traces
17
Step 2 – Further Explained
2032 32
2032 23
2033 54
2033 2
2043 3
2033 63
2032 34
2032 33
2043 23
2032 2
2033 4
2033 5
32 23 34
23 34 33
54 2 63
![Page 18: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/18.jpg)
Intrusion detection systems based on process traces
18
Step 2 – Further Explained
2032 32
2032 23
2033 54
2033 2
2043 3
2033 63
2032 34
2032 33
2043 23
2032 2
2033 4
2033 5
32 23 34
23 34 33
54 2 63
2 63 4
![Page 19: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/19.jpg)
Intrusion detection systems based on process traces
19
Step 3: Build the Process Data Model
• The process data model is a mathematical representation of normal behavior
• Improving the process data model improves the model of normal behavior.
• It should represent the underlying truth of normalcy of the data
![Page 20: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/20.jpg)
Intrusion detection systems based on process traces
20
A New Process Data Model
• We represent normal behavior with a statistical method called fuzzy k-modesUses cluster centers or centroidsUses distances away from the centroids
• We add the element of fuzzy logic to our methodFuzzy logic should better model the uncertainty in the
data It allows as to determine to what degree an intrusion
is. If a string is off by one system call in a hard method
then it is completely off. If a string is off by one system call in a fuzzy method
then it is still pretty much normal.
![Page 21: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/21.jpg)
Intrusion detection systems based on process traces
21
Other Process Data Modeling Techniques Have Been Used
• Previous used techniques include:Stide Forrest et. al.Frequency stide Warrender et. al.A rule based method Lee et. al. & Helmer
et. al.Hidden Markov Models Warrender et. al.Automata Kosoresow et. al.
• No one method has been proven the best
![Page 22: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/22.jpg)
Intrusion detection systems based on process traces
22
Step 4: Compare New Process Data with the Process Data Model
• New process data is converted to a form that can be compared against the process data model.Our form is also a set of strings
• This new data is compared and later classified in step 5 as normal or abnormal behavior
![Page 23: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/23.jpg)
Intrusion detection systems based on process traces
23
Step 5: Determine an Intrusion
• Hard limits are given to the intrusion signal to determine if new process data is either a normal or abnormal behavior
• One and a half times the maximum self test signal is considered a true negative. Anything less is a false negative.
![Page 24: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/24.jpg)
Intrusion detection systems based on process traces
24
Five steps for Intrusion Detection Systems Based on Process Traces
• Five steps revisited
![Page 25: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/25.jpg)
25
Overview
• Computer Security• Intrusion Detection Systems based on process traces
• Background discussion• Fuzzy k-modes• Our process data model• Comparing new process traces• Experiments and Results• Conclusion
![Page 26: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/26.jpg)
Background discussion 26
Background Discussion
• What are clusters?
• What are cluster centers?
• What are memberships?
• What is the difference between quantitative data and categorical data?
![Page 27: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/27.jpg)
Background discussion 27
What are Clusters?• Two dimensional state space of all the possible strings.
We then find the centers of the clusters or centroids• Clusters are groupings of similar objects
C are the CentroidsX are the strings
![Page 28: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/28.jpg)
28
What are Memberships?• The distance to the closest centroid is taken as that
strings memberships• Distances are inverted – closer to 0 is further away
C are the cluster centers, or centroidsX are the strings
![Page 29: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/29.jpg)
Background discussion 29
What is Categorical Data?
• Previous graphs were based on quantitative data– Our data is categorical
• Categorical data is data like the following– Red, blue, green, yellow– Ford, Honda, GM, Ferrari
• There is no distance between categories– The 6th system call is not twice as far as the
3rd system call.
![Page 30: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/30.jpg)
Background discussion 30
Categorical Hamming Distance• We have 8 strings of length 3• 2 categories in each string position, 0 and 1
![Page 31: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/31.jpg)
31
Overview
• Computer Security• Intrusion Detection Systems based on process traces• Background discussion
• Fuzzy k-modes• Our process data model• Comparing new process traces• Experiments and Results• Conclusion
![Page 32: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/32.jpg)
Fuzzy k-modes 32
Why use Fuzzy k-Modes?
• We use the fuzzy k-modes algorithm to find centroids and memberships of the strings to the centroids
• Fuzzy k-modes finds trends in the data that represent the most normal behavior
![Page 33: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/33.jpg)
Fuzzy k-modes 33
It is Supervised Learning, Unsupervised Clustering.
• Supervised Learning– Data is previously known to be normal or
abnormal
• Unsupervised Clustering– Number of clusters is not known, we do not
seed the clusters with known cluster centers
![Page 34: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/34.jpg)
34
Fuzzy k-Modes Explained
• Fuzzy k-modes consists of minimizing the following equation:
n
k
c
ikicik
ZWxzdwZWF
1 1,
),(),(min
• W is the memberships matrix • Z is the centroid matrix• d sub c is the dissimilarity measure• n is the number of strings • c is the number of clusters• alpha is a fuzzifying factor
![Page 35: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/35.jpg)
Fuzzy k-modes 35
Matrixes
• Membership matrix– the number of strings by the number of
clusters. – It consists of the memberships to each
centroid.
• Centroid matrix – the number of clusters by the string length– It consists of all the centroids.
![Page 36: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/36.jpg)
Fuzzy k-modes 36
Dissimilarity Measure• The following is the published fuzzy k-modes
dissimilarity measure.• Generalized Hamming distance
),1,1(),(),(1
lknlnkxxxxdp
jljkjlkc
ljkj
ljkj
ljkj xxif
xxifxx
1
0),(
• p is the string length• x is a string
![Page 37: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/37.jpg)
Fuzzy k-modes 37
Example of Dissimilarity Measure
3 5 10 5 7 4
3 7 10 2 3 4
• This gives a value of 3
![Page 38: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/38.jpg)
Fuzzy k-modes 38
We Created a New Dissimilarity Measure
• More weight should be given to less difference than many differences.
• The third difference should rate higher than the twelfth difference
• We want a non linear weight to differences
![Page 39: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/39.jpg)
Fuzzy k-modes 39
New dissimilarity measure
• Logarithmic Hamming distance
• Normalized on string length
)log(
1),(1log),(log b
pxxdbxxd lkclk
• b = 1000 - anything less and our logarithmic curve would be too linear• p is string length
![Page 40: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/40.jpg)
Fuzzy k-modes 40
New measure example• A string that has 5 differences out of 14 is .85
![Page 41: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/41.jpg)
Fuzzy k-modes 41
Effect of Logarithmic Measure on Intrusion Signal
length = 6, Live Inetd
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
clusters
intr
usi
on
sin
gal
Str
eng
th
alpha = 1.19
alpha = 1.27
• Previous linear measure • Note how signal becomes random after 10 clusters.
![Page 42: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/42.jpg)
Fuzzy k-modes 42
Effect of Logarithmic Measure on Intrusion Signal• Note how signal stays strong after 10 clusters• After 18 clusters we start to see repeated centroids• Lines are more smooth
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of Clusters
Intr
usio
n Si
gnal
Diff avg
Diff bott. 25%
Diff locality * 10
Diff median
Diff Ratio .85
![Page 43: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/43.jpg)
Fuzzy k-modes 43
Fuzzy k-Modes Algorithm
• To find the minimum of the equation given earlier (F) we try to solve a system of non-linear equations.– No solution is known to solve a system of non-linear
equations– Best solution so far is given below
• Algorithm1. Initialize the parameters
2. Fix the Centroids, then update the Memberships
3. Fix the Memberships, then update the Centroids
4. Continue to step 2 until some criteria is met.
![Page 44: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/44.jpg)
Fuzzy k-modes 44
Fuzzy k-Modes, Step 1: Initialize the Parameters
• Choose alpha and number of clusters
• Then seed the centroid matrix– Published algorithm called for a random
seeding– We chose a smart seeding
• Most common occurring symbols in first centroid• Second most common occurring symbols in
second centroid, etc.
![Page 45: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/45.jpg)
45
Fuzzy k-Modes Step 2: Fix Centroids, Update Memberships• We update the memberships according to the following
equation
cjzxandzxif
kjc
kic
ijbutzxif
zxif
wjkik
c
j
jk
ik
ik
xzdxzd
1,1
0
1
1
)1(
1
),(),(
• z is a centroid• x is a string• c is the number of clusters
![Page 46: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/46.jpg)
46
Fuzzy k-Modes Step 3: Fix Memberships, Update Centroids• We update Z according to the following equation
),1()()( ,,
)( trstwwwhereaztjkj
rjkj axk
ikaxk
ikrjij
• Find the symbol with the highest summation of memberships to the i-th centroid with that symbol in the j-th position • Assign that to the i-th centroid’s j-th position
• z is a centroid• w is a membership• r and t are system call numbers
![Page 47: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/47.jpg)
Fuzzy k-modes 47
Reduced Time Complexity in this Step
• Reduced from cpsn to cpn c is the number of clustersp is the string lengths is the number of system callsn is the number of strings
• Accomplished this with an accumulation matrix that is later sorted
![Page 48: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/48.jpg)
Fuzzy k-modes 48
Step 4: Stop at Some Criteria
• When the fuzzy k-modes equation (F) in the current step equals the equation (F) in the previous step.
• F is the fuzzy k-modes equation that we try to minimize.
![Page 49: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/49.jpg)
Fuzzy k-modes 49
Fuzzy k-Modes Drawbacks
• Sensitive to initialization
• a priori knowledge of the number of clusters
![Page 50: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/50.jpg)
50
Overview
• Computer Security• Intrusion Detection Systems based on process traces• Background discussion• Fuzzy k-modes
• Our process data model• Comparing new process traces• Experiments and Results• Conclusion
![Page 51: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/51.jpg)
Intrusion detection systems based on process traces
51
Our Process Data Model Algorithm
1. Fix the number of clusters then run fuzzy k-modes several times and choose the run with the optimal alpha
2. Fix that alpha then run fuzzy k-modes several times to choose the run with the optimal number of clusters
3. Take the memberships and centroids found with the best alpha and number of clusters and use those to compare new process data
![Page 52: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/52.jpg)
Our process data model 52
Step 1: How do We Pick the Best Alpha?
• Run the fuzzy k-modes several times
• Choose the run that gives the best alpha according to some criteria.Our Criteria is the best uniform distribution of
memberships
• How do we determine a uniform distribution of memberships?We tried the Chi Square index
![Page 53: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/53.jpg)
Our process data model 53
Problem with Chi Square Index
• The chi square index favors the wrong distribution.
• We want the red distribution, chi square favors the blue distribution
• Otherwise we don’t get a nice U shape curve.
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12
Series1
Series2
![Page 54: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/54.jpg)
Our process data model 54
New Uniform Measure
• We created the adjusted chi square index to favor the second distribution
k
xA
k
iiE
1
log
• E is the expected number of objects per class• x is the number of objects for that class • k is the number of classes. • We divide this measure into the chi square measure to get the adjusted measure.
![Page 55: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/55.jpg)
Our process data model 55
How do Uniform Memberships Affect Intrusion Signal?
Alpha vs Detection Signal with Chi Square Indexes
-1
0
1
2
3
4
5
6
7
8
1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1 1.11
Alpha
Det
ecti
on
Sig
nal
Chi Square
Adjusted Chi Square
Average * 10
Diff of .85 ratio
Bottom 25% Diff
Diff Locality Frame * 10
Diff. Median
![Page 56: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/56.jpg)
Intrusion detection systems based on process traces
56
Our Process Data Model Algorithm
1. Fix the number of clusters then run fuzzy k-modes several times and choose the run with the optimal alpha
2. Fix the alpha then run fuzzy k-modes several times to choose the run with the optimal number of clusters
3. Take the memberships and centroids found with the best alpha and number of clusters and use those to compare new process data
![Page 57: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/57.jpg)
Our process data model 57
Step 2: Now We Determine the Number of Clusters
• Use alpha found in the previous step
• Run fuzzy k-modes for various numbers of clusters
• Choose one run according to some criteria.– Our criteria are validity indexes.
![Page 58: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/58.jpg)
Our process data model 58
Validity Indexes
• Validity indexes are our criteria to choose the optimal number of clusters
• They represent the underlying truth in the data
• We considered the followingKim’s indexKwon’s indexBezdek’s partition entropy index
![Page 59: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/59.jpg)
Our process data model 59
Conversion of Indexes
• Kim’s and Kwon’s index work only with quantitative dataWe converted the indexes from quantitative to
categorical
• Our results were not favorableIndexes tended to monotonically or semi-
monotonically decrease as the number of clusters approached the number of data samples
![Page 60: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/60.jpg)
Our process data model 60
Bezdek’s Worked the Best
• With Bezdek’s partition entropy index we chose values around 15 to 18 consistently.
![Page 61: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/61.jpg)
Our process data model 61
New Validity Index Published
• Tsekouras et. al.
• Published after completion of thesis
• Works with fuzzy categorical clustering
![Page 62: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/62.jpg)
Intrusion detection systems based on process traces
62
Our Process Data Model Algorithm
1. Fix the number of clusters then run fuzzy k-modes several times and choose the run with the optimal alpha
2. Fix the alpha then run fuzzy k-modes several times to choose the run with the optimal number of clusters
3. Take the memberships and centroids found with the best alpha and number of clusters and use those to compare new process data
![Page 63: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/63.jpg)
63
Overview
• Computer Security• Intrusion Detection Systems based on process traces• Background discussion• Fuzzy k-modes• Our process data model
• Comparing new process traces• Experiments and Results• Conclusion
![Page 64: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/64.jpg)
Comparing new process data 64
Comparing New Process Data
• New process data is compared against the process data model
• Memberships of the new strings are found to the centroids found from the process data model
• The distance to the closets centroid is taken as that strings membership value.
![Page 65: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/65.jpg)
65
Comparing New Process Data• Image a 2 feature quantitative state space.• 2 classes of new process data, 3 clusters each
• A is Abnormal data• N is Normal data• T are the centroids from the training data
![Page 66: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/66.jpg)
Comparing new process data 66
Comparing Algorithm
1. Find the distances of the training strings to the centroids found from the process data model
2. Find the distances of the new strings to the same centroids
3. Take the differences of the distances
![Page 67: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/67.jpg)
Comparing new process data 67
Step 1: Find the Distances for the Training Strings
• We find the following distances of the memberships to the closest centroid found from the process data modelAverage membershipMedian membershipAverage of the bottom 25% of membershipsRatio of strings below .85 to all stringsMinimum average membership across 10
consecutive strings (locality frame)
![Page 68: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/68.jpg)
Comparing new process data 68
Step 2: Find the New String’s Distances
• We find the distances of the new strings to the training centroids from the process data model
• We calculate the new strings memberships using step 2 of fuzzy k-modes: Fix the centroids and update the memberships.Average membershipMedian membershipBottom 25% average membershipRatio of strings below .85 to all stringsMinimum average across 10 consecutive strings
(locality frame)
![Page 69: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/69.jpg)
Comparing new process data 69
Step 3: Take the Differences
• We take the differences of the training strings distances and the new strings distances
• These are our intrusion signals
![Page 70: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/70.jpg)
70
Overview
• Computer Security• Intrusion Detection Systems based on process traces• Background discussion• Fuzzy k-modes• Our process data model• Comparing new process traces
• Experiments and Results• Conclusion
![Page 71: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/71.jpg)
Experiments and results 71
The Experiments
• Self testsTrained 50% of data, tested other 50%Did this twice
• Intrusion TestsIntrusionsError conditionsUnsuccessful intrusions
![Page 72: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/72.jpg)
Experiments and results 72
The Data Set
• Collected by Dr. Stephanie Forrest at the University of New Mexico
• Contains two types of data– Synthetic Data
• Created artificially• Did not self test
– Live Data• From a real working environment
![Page 73: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/73.jpg)
Experiments and results 73
The Programs
• Live ps– Reports process status
• Live login– Sign onto a system
• Synthetic LPR– Submit print requests
• Live inetd– Listens to network requests for services
![Page 74: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/74.jpg)
Experiments and results 74
The Intrusions
• Live ps and Live login– Trojan code from the Linux root kit
• Synthetic LPR– lprcp intrusion
• Live inetd– Denial of service attack
![Page 75: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/75.jpg)
Experiments and results 75
Comparison Against Stide
• We compared our results against stide
• An m look ahead table lookup
• Runs in O(n) time where n is the number of strings
![Page 76: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/76.jpg)
Background discussion 76
Data is Normalized
• All data is normalized between zero and one.• Fuzzy k-Modes emited signals between -1 and 1. They
are normalized to 0 and 1 as follows– A – Training strings are maximal distant from centroids– B – New strings and training strings are equally distant– C – New strings are maximal distant from centroids
-1 1
0 1
0
.5
A B C
![Page 77: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/77.jpg)
Experiments and results 77
Live Inetd
• No Self Tests for live inetd– Data Set too small – only about 500 system
calls
![Page 78: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/78.jpg)
Experiments and results 78
Live Inetd – Intrusion TestsLive inetd Stide Fuzzy k-Modes
StringLength
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of .85
6 1.0000 0.5552 0.9234 0.7438 0.7048 0.5105 0.7672
10 1.0000 0.5829 0.9311 0.7429 0.6940 0.5161 0.7758
14 1.0000 0.6045 0.9164 0.7490 0.7254 0.5141 0.7848
• All numbers are normalized between 0 and 1• Closer to 0 is more normal, closer to 1 is intrusive
![Page 79: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/79.jpg)
Experiments and results 79
Live Ps – Self Tests
• 0.5 for fuzzy k-modes indicates normal behavior – new strings are same distance to centroids as training strings• less than 0.5 is more normal, greater is more abnormal• Green indicates false positive
Live ps Stide Fuzzy k-Modes
Trace #
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of .85
1 0.5000 0.0094 0.5000 0.5012 0.4963 0.5000 0.4955
2 1.0000 0.0775 0.5000 0.5105 0.5143 0.5095 0.5177
![Page 80: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/80.jpg)
Experiments and results 80
Live Ps – Intrusion Tests
• Two types of intrusions– Homegrown– Recovered
Red in next slide indicates false negative
![Page 81: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/81.jpg)
81
Live Ps - HomegrownLive ps Stide Fuzzy k-Modes
Trace#
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of.85
1 0.5000 0.0945 0.5008 0.5377 0.5686 0.5000 0.5579
2 0.5000 0.0903 0.5008 0.5328 0.5627 0.5000 0.5500
3 0.5000 0.0866 0.5008 0.5284 0.5581 0.5000 0.5427
4 0.5000 0.0831 0.5005 0.5244 0.5517 0.5000 0.5360
5 0.5000 0.0799 0.5002 0.5207 0.5467 0.5000 0.5298
6 0.5000 0.0308 0.5000 0.4788 0.4221 0.5000 0.4601
7 0.5000 0.0287 0.5000 0.4778 0.4197 0.5000 0.4583
8 0.5000 0.0301 0.5000 0.4705 0.3897 0.5000 0.4509
9 0.5000 0.0264 0.5000 0.4686 0.3825 0.5000 0.4482
10 0.5000 0.0642 0.5245 0.5640 0.5627 0.5000 0.6055
11 0.6500 0.0789 0.5268 0.5678 0.5687 0.5000 0.6097
12 0.7000 0.0924 0.5377 0.5703 0.5663 0.5000 0.6146
13 0.7000 0.0681 0.5000 0.5040 0.5171 0.5000 0.4989
14 0.7000 0.2150 0.6907 0.6153 0.6098 0.5000 0.6933
15 0.7000 0.0570 0.5000 0.5067 0.5175 0.5000 0.5086
![Page 82: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/82.jpg)
Experiments and results 82
Live Ps - RecoveredLive ps Stide Fuzzy k-Modes
Trace#
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of.85
16 1.0000 0.1409 0.5008 0.5294 0.5495 0.5037 0.5500
17 1.0000 0.1346 0.5008 0.5248 0.5464 0.5037 0.5422
18 1.0000 0.1288 0.5005 0.5207 0.5394 0.5037 0.5350
19 1.0000 0.1235 0.5002 0.5169 0.5326 0.5037 0.5284
20 1.0000 0.1186 0.5001 0.5134 0.5256 0.5037 0.5224
21 1.0000 0.0569 0.5000 0.4742 0.4040 0.5037 0.4609
22 1.0000 0.0529 0.5000 0.4712 0.3921 0.5037 0.4536
23 1.0000 0.1191 0.5000 0.4982 0.4953 0.5037 0.4985
24 0.9500 0.2688 0.6879 0.6205 0.6133 0.5037 0.7035
25 1.0000 0.1004 0.5000 0.5025 0.5033 0.5037 0.5068
26 0.9500 0.1341 0.5455 0.5685 0.5636 0.5037 0.6157
![Page 83: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/83.jpg)
Experiments and results 83
Live Login – Self Tests
Livelogin Stide Fuzzy k-Modes
Trace#
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of.85
1 0.4500 0.0031 0.5000 0.4999 0.4998 0.4971 0.5000
2 0.6500 0.0092 0.5020 0.5001 0.5002 0.5007 0.5000
• 0.5 for fuzzy k-modes means new strings are same distance as training strings to centroids
![Page 84: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/84.jpg)
Experiments and results 84
Live Login – Intrusion TestsLivelogin Stide Fuzzy k-Modes
Trace#
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of .85
Hm/1 0.0000 0.0000 0.5074 0.5008 0.5005 0.5000 0.5012
Hm/2 1.0000 0.1183 0.5611 0.5153 0.5026 0.4916 0.5162
Hm/3 0.0000 0.0000 0.5348 0.5039 0.5009 0.4885 0.5042
Hm/4 0.8000 0.0566 0.4601 0.4423 0.4696 0.4861 0.4153
Rc/5 1.0000 0.2095 0.4601 0.4586 0.4875 0.4998 0.4330
Rc/6 1.0000 0.2095 0.4601 0.4586 0.4875 0.4998 0.4330
Rc/7 1.0000 0.2386 0.4601 0.4662 0.4899 0.4998 0.4439
Rc/8 1.0000 0.1777 0.4601 0.4463 0.4844 0.4982 0.4151
Rc/9 1.0000 0.2386 0.4601 0.4662 0.4899 0.4998 0.4439
![Page 85: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/85.jpg)
Experiments and results 85
Synthetic LPR – Intrusion Tests
• No Self Tests because synthetic data
Synth.LPR Stide Fuzzy k-modes
StringLength
LocalityFrame
Mis-match Median Avg.
Bottom25%
LocalityFrame
Ratio of .85
6 0.6500 0.0980 0.5995 0.5692 0.5453 0.5346 0.6046
10 1.0000 0.1625 0.7405 0.6024 0.5200 0.5155 0.6497
14 1.0000 0.2229 0.5136 0.5540 0.5968 0.5462 0.6001
![Page 86: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/86.jpg)
Experiments and results 86
Other Results
• New uniform measure
• New dissimilarity measure
• Reduced time complexity
• Invalidity of converting quantitative validity indexes to categorical data
![Page 87: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/87.jpg)
87
Overview
• Computer Security• Intrusion Detection Systems based on process traces• Background discussion• Fuzzy k-modes• Our process data model• Comparing new process traces• Experiments and Results
• Conclusion
![Page 88: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/88.jpg)
Conclusion 88
Discussion
• Pros– Fast once trained– Better accuracy on some processes
• Cons– Long learning time– Must be collected during a clean period
![Page 89: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/89.jpg)
Conclusion 89
Conclusions
• Fuzzy k-modes as analyzing patterns of system calls is not panacea.
• Works good for some not for all
• Works just as good as stide
• Is it worth the extra computational cost? Depends on the processes in question.
![Page 90: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/90.jpg)
Conclusion 90
Future Work
• Boiling Frog in the Pot
• System of non-linear equations
• System call timing
• Sensitivity of fuzzy k-modes
• Fuzzy grammar inference
![Page 91: Using Fuzzy k-Modes to Analyze Patterns of System Calls for Intrusion Detection](https://reader036.vdocument.in/reader036/viewer/2022062519/56815249550346895dc085e3/html5/thumbnails/91.jpg)
91
Questions?