4 intrusion detection system using fuzzy data mining
TRANSCRIPT
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
1/8
AN INTRUSION DETECTION SYSTEMUSING FUZZY DATA MINING AND
GENETIC ALGORITHMSB.SAI PRAVEEN, K.SUBRAHMANYAM
Sa i_ p r a vee n 1 7 @yah o o . c o m rajaa. 0 32 1 @ ya h o o .co m
3RD YEARCSE,
ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES,
SANGIVALASA, VIZAG.
ABSTRACT :- Intrusion Detection systems are
increasingly a key part of systems defense. Various
approaches to Intrusion Detection are currently being
used. Artificial intelligence plays a driving role in
security services. This paper presents a dynamic
intelligent Intrusion Detection system model, based AI
approach which includes fuzzy logic and simple data
mining techniques to process network data. This system
combines two distinct intrusion approaches: 1)Anomaly
based intrusion detection system using fuzzy datamining techniques, and 2)Intrusion detection systems
using genetic algorithms.
1.
INTRODUCTION:
Information has become an
organizations most precious asset.
Organizations have become increasingly
dependent on information, since more
information is being stored
and processed on network-based systems.
Hacking, viruses, worms and trozanhorses are some of the major attacks. A
significant challenge in providing an
effective mechanism to a network is the
ability to detect novel attacks or any
intrusion works and implement counter
measures. Intrusion detection is a
critical component in securing
information systems. Intrusion detection
is implemented by an Intrusion
detection system. Intrusion detection
system, can detect, prevent and react to
the attacks. Intrusion detection has
become the integral part of the
information security process.
2.0 INTRUSION DETECTION
SYSTEMS
2.1 AN OVERVIEW OF CURRENT
INTRUSION DETECTION SYSTEMS:
Intrusion detection is defined [1] as the process of
intelligently monitoring the events occurring in a
computer system or network and analyzing them
for signs of violations of the security policy. The
primary aim of IDS is to protect the availability,
confidentiality and integrity of critical networked
information systems. IDS are defined by both the
method used to detect attacks and placement of
the IDS on network. IDS may perform either
misuse detection or anomaly detection and may be
deployed as a network based system or host basedsystem. This result in four general groups: misuse-
host, misuse-network, anomaly host and anomaly
network. Misuse detection relies on matching
known patterns of hostile activity against
databases of past attacks. They are highly
effective at identifying known attack and
vulnerabilities, but rather poor in identifying new
security threats. Anomaly detection will search for
something rare or unusual by applying statistical
measures or artificial intelligence methods to
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
2/8
compare current activity against historic
knowledge.
Common problems with anomaly-based systemsare that, they often require extensive training data
for artificial learning algorithms, and they tend to
be computationally expensive, because several
metrics are often maintained, and need to be
updated against every system activity. Some IDS
combine qualities from all these categories and are
known as hybrid systems.
FIG
1[19]
2.2 COMPUTER ATTACK
CATEGORIES:
DARPA [2] categorizes the attacks into five major
types based on goals and actions of the attacker.
DoS (Denial-of-service ) attacks tries to make
services provided by or to computer users to be
restricted or denied. For example, in SYN-flood
attack, the attacker floods the victim host with
more TCP connections requests that can handle,
causing the host to be unable to respond even to
valid requests. Probe attacks attempts to get
information about an existing computer or
networkconfigurations.
Remote to local (R2L) attacks are caused by anattacker who has only remote access rights. These
attacks occur when the attacker tries to get local
access to a computer network.
User to root(U2R) attacks are performed by an
attacker who has rights at user level access and
tries to obtain super user access.
Probing attacks: In this type of attacks, an
attacker scans a network of computers to gather
information of find known vulnerabilities.
Data attacks are performed to gain access to some
information to which the attacker is not permitted
to access. Many R2L and U2L goals are for
accessing the secret files.
2.3 IDS DESIGN
PRINCIPLES:
IDS are designed and implemented on modelled
network systems. Several points should be
predefined and stated, inorder to find proper
model for network:
Normal behavior of a network system is
the most dominant and frequent behavior
of the network in a certain time period.
Anomaly within the network system least
frequent and abnormal behavior of the
network at certain time period.
Modelling a dynamic and complex system such as
the network is very difficult, for this reason ,
abstraction and partial modelling are used as good
solution. The whole network components could bedivided into:
Host
User
Networkenvironment
The user itself could be divided into
legimate user and malicious user
(intruder). Many other nested divisions
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
3/8
could occur according to the designers
point of view and the areas offocus.
An Intruder detection system basically raises analarm whenever an anomaly event occurs, which
could be caused by an intruder to the system.
These systems do not react equally at all times,
false alarms could occur sometime and this is
called False Positive (FP).The lower value of FP
gives a higher value of the IDS.[3][4]
2.4 IDS DESIGN
TRENDS
There are number of different ways to classify
IDS in order to distinguish between their different
types. The most generic classification 1 found for
IDS is:
Analysis approach
Placement ofIDS
Under each of these categories several
classifications could occur.[5]
2.4.1.ANALYSIS
APPROACH:
Boer and Pels[6], gave three types of IDS whichcould be listed under this approach:
NIDS: Network-based IDS which
monitors the network for malicious
traffic.
HIDS: Host-based IDS which monitors
the activities of a single host.
DIDS: Distributed IDS correlate events
from different Host- or Network based
IDS
2.4.2.PLACEMENT
OF IDS:
In this respect IDS are usually divided into:
SIDS: Signature-based IDS, which studies
the attacks patterns and defines a
signature for it, to enable security
specialists to design a defense against that
attack.
AIDS: Anomaly-based IDS, which learnsthe usual behavior of a network patterns,
and suspects an attack once an anomaly
occurs.
2.5 DATA CAPTURING USINGSNORT:
Snort is mainly a Network Intrusion Detection
System (NIDS);it is Open Source and available for
a variety of unices. Snort also can be used as a
sniffer to troubleshoot network problems.
Basically there are three modes in which Snort
can be configured:
Sniffer mode simply reads the packets off
of the network and displays them in a
continuous stream on the console.
Packet logger mode logs the packets to
the disc.
Network intrusion detection system is the
most complex and configurable
configuration, allowing snort to analyze
network traffic for matches against a user
defined rule set and performs several
actions based upon what it sees.
3.DATAMINING AND FUZZY
LOGIC
3.1DATAMINI
NG
Data mining methods are used to automatically
discover new patterns from a large amounts of
data[7]. Data mining is the automated extraction
of previously unrealized information from large
data sources for the purpose of supporting
actions. The rapid development in data mining
has made available a wide variety of algorithms,drawn from the field of statistics, pattern
recognization, machine learning and databases.
Specifically, data mining approaches have been
proposed and used for anomaly detection.
3.1.1.ASSOCIATION
RULES
Association rules were first developed to find
correlations in tractions using real data[8]. For
example, if a customer who buys a soft drink(A)
usually also buys potato chips(B), then potato
chips are associated with soft drinks using the ruleA->B. suppose that 25% of all customers buy both
A and B and that 50% of the customers who buy
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
4/8
A also buy B. then the degree of support for the
rule is s=0.25 and degree of confidence in the rule
is c=0.50. Agarwal and Srikanth developed the
fast Apriori algorithm for mining associationrules. The Apriori algorithm requires two
thresholds of minconfidence and minsupport.
These two thresholds determine the degree of
association that must hold before the rule will be
mined.
3.2 FUZZY
LOGIC
Fuzzy logic was introduced as a means to the
model of uncertainity of natural language. And
due to the uncertainity nature of intrusions fuzzy
sets are strongly used in discovering attack events
and reducing the rate of false alarms at the same
time.
Basically, intrusion detection systems distinguish
between two distinct types of behaviors, normal
and abnormal, which creates two distinct sets of
rules and information. Fuzzy logic could create
sets that have in-between values where the
difference between two sets are not well defined.
In this case the logic depends on linguistics by
taking the minimum of set of events or maximum
instead of stating of OR, AND or NOT operation
in the if then else condition. This feature strongly
participates in reducing the false positive alarm
rates in the system[9][10].
Applying fuzzy methods for the development of
IDS yield some advantages, compared to the
classical approach. Therefore, Fuzzy logic
techniques have been employed in the computer
security field since in the early 90s. The fuzzy
logic provides some flexibility to the uncertain
problem of intrusion detection and allows muchgreater complexity for IDS. Most of the fuzzy IDS
require human experts to determine the fuzzy sets
and set of fuzzy rules.
These tasks are time consuming. However, if the
fuzzy rules are automatically generated, less time
would be consumed for building a good intrusion
classifier and shortens the development time of
building or updating an intrusion classifier. A
dynamic fuzzy boundary is developed from
labelled data for different levels ofsecurity.
4. ANOMALY DETECTION VIA
FUZZY DATAMINING
Fuzzy logic is based on Fuzzy set theory. In
contrast to standard set theory in which each
element is either completely In or not in a set .
fuzzy set theory allows partial membership in
sets. This provides a powerful mechanism for
representing vague concepts. Data mining
methods are used to automatically learn patterns
from large quantities of data. The integration of
fuzzy logic with data mining methods will help to
create more abstract patterns at a higher level
than at the data level. Patterns that are more
abstract and less dependent on data will be
helpful of intrusion detection.
In the intrusion detection domain , we may want
to reason about a quantity such as the number of
different destination IP addresses in the last 2
seconds. Suppose one wants to write a rule such
as If the number of different destination
addresses during the last 2 seconds was high Then
an unusual situation exists.
Using traditional logic , one would need to decide
which values for the number of destination
addresses fall into the category high. As shown infig 4a. , one would typically divide the range of
possible values into discrete buckets, each
representing a different set. The value 10 , for
example is a member of the set low to the degree
1 and a member of the other two sets , medium
and high, to the degree 0. In Fuzzy logic , a
particular value can have a degree of membership
between 0 and 1 and can be a member of more
than one fuzzy set. In fig 4b, for example , the
value 10 is a member of the set low to the degree
0.4 and a member of the set medium to the degree
0.75 . In this example , the membership functions
for the fuzzy sets are piecewise linear functions.
Using fuzzy logic terminology , the number of
destination ports is a fuzzy variable(also called a
linguistic variable), while the possible values ofthe
fuzzy variable are the fuzzy sets low, medium,
and high. In general ,fuzzy variables corresponds
to nouns and fuzzy sets corresponds to adjectives.
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
5/8
4a. NON FUZZY SETS
Using fuzzy logic , a rule like the one shown above
could be written as
If the DP=high
Then an unusual situation exists.
Where DP is a fuzzy variable and high is a fuzzy
set. The degree of membership of the number of
destination ports in the fuzzy set high determines
whether or not the rule is activated.
5. IDS USING GENETIC
ALGORITHMS:
A genetic Algorithm is a programming
technique that mimics biological evolution as a
problem solving strategy[11]. It is based on
Darwinians principle of evolution and survival of
fittest to optimize a population of candidate
solutions towards a predefined fitness[12][13].
GA uses an evolution and natural selection that
uses a chromosome like data structure and
evolve the chromosomes using selection,
recombination, and mutation operators.
The process usually begins with randomly
generated population of chromosomes, which
represent all possible solution of a problem that
are considered candidate solutions.
Different positions of each chromosome are
encoded as bits, characters or numbers.
These positions can be referred to as genes. An
evolution function is used to calculate the
goodness of each chromosome according to the
desired solution, this function is known as Fitness
Function. During evolution, two basic operators,
crossover and mutation ,are used to simulate the
natural reproduction and mutation of species. The
selection of chromosomes for survival and
combination is biased towards the fittest
chromosomes[14][15][17].
The following figure taken from [16] shows the
structure of a si9mple genetic algorithm. Starting
by random generation of initial population, then
evaluate and evolve through selection
,recombination ,and mutation. Finally, the best
individual(chromosome) is picked out as the final
result once the optimization meet it target.
Many others and researchers are highly motivated
to genetic algorithms as a strong and efficient
method used in different field in Artificial
Intelligence, noting that several AI techniques
could be combined in different ways in different
systems for several purposes.
The genetic algorithm is employed to drive a set of
classification rules from network audit data, and
the support-confidence framework is utilized as
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
6/8
fitness to judge the quality of each rule. The
generated rules are the used to detect or classify
network intrusions in a real-time environment.
5.1. A GENETIC GA BASED
INTRUSION DETECTION
APPROACH:
As a conclusion of what previously presented of
AI based IDS, these systems work is divided into
two main stages. First the training stage which
provides the system with necessary information
required initially, after that the next step is the
detection stage where the system detects
intrusions according to what was learned in the
previous step. Applying this in GA based IDS; the
GA is trained with classification rules learned
from previous network audit data. The second
stage is applied in real-time manner by classifying
the incoming network connections according to
the generated rules.
Many systems have been proposed in a lot of
researchers in either simple or advanced fashion
,but to give a general idea of the components of
the system and basic mechanism of it; the three
following components will be highlighted:
1-DataRepresentation
Genes should be represented in some format using
different data types such as byte, integer and float.
Also they may have different data ranges and
other features, knowing that the genes are
generated randomly ,in each population
generating iteration.
Genetic Algorithms can be used to evolve rules for
the network traffic; these rules are usually in the
following form:
if {condition} then {act}[16].
It basically contains if-then clause, a condition and
an act. The conditions usually matches the current
network behaviour with the one stored in the IDS
such as comparing an intruder source IP address
and port number with one already stored in the
system. The act could be an alarm indicating that
the intrudes IP and port numbers are related to
an attacker who is previously known in the
system.[16][8].
2-GAParameters
GA has some common elements and parameters
which should be defined :
Fitness Function is defined according
to[11],The fitness function is defined as a
function which scales the value individual
relative to the rest of population. It
computes the best possible solutions from
the amount of candidates located in the
population.
GA Operators According to the figure
below we could see that the selection
mutation and crossover are the most
effective parts in the algorithm as they
are they participate in the generation of
each population.
Selection is the phase where population
individuals with better fitness are selected,otherwise it gets damaged.
Crossover is a process where each pair of
individuals selects randomly participates
in exchanging their parents with each
other, until a total new population has
been generated.
Mutation flips some bits in an individual
,and since all bits could be filled ,there is
low probability of predicting the change.
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
7/8
3-Detection Algorithm overview
In [8],a genetic algorithm has been presented
which contains a training process. Thisalgorithm is designed to apply set of
classification rules according to the input data
given. It follows the simple flow of genetic
algorithms presented in the Figure{draw the
fig: the operation ofGA}
PROCEDURE: Rule set generation using
genetic algorithm.
INPUT: Network audit data, number of
generations , and population size.
OUTPUT: A set of classification rules
PARAMETERS:
NAD: Network Audit Data
PS: Population Size
N: Number of records in training set
PSEUDO CODE:
Ruleset (NAD, PS, N)
{
W1=0.2, W2=0.8, T=0.5;
For each chromosome in PS
Begin 1:
A=0; AB=0;
For each record in training set
Begin 2:
If(record==chromosome)
AB=AB+1;
If(record==condition)
A=A+1;
End 2
Fitness=W1*AB/N+W2*AB/A;
If(Fitness>T)
Select chromosome into new population;
End 1
For each chromosome in new population
Begin 3:
Crossover (chromosome);
Bmutation (chromosome);
End 3
If (Number ofgenerations
-
8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining
8/8
4.Gorodetsky,V.,I.Kotenko, and O.Karsaev. Multi agent
technologies for computer network security:Attack simulation,
Intrusion detection and intrusion detection learning.
5. Luan Qinglin, Lu Huibin, Research of intrusion detection
based on neural network optimized by adaptive genetic
algorithm, Computer Engineering and Design, vol. 29,no. 12,
pp. 3022-3025, 2008.
6.De Boer, P., and Martin Pels, Host-Based Intrusion
Detection System., Technical Report:1.10, Faculty ofScience,
Informatics Institute, University of Amsterdam, 2005.
7.Lee, W., S. Stolfo, and K.Mok. 1998. Mining audit data to
build intrusion detection models.
8.Agarwal, R., and R. Srikanth. 1994. Fast algorithms for
mining association rules.
9.Yao, J.T., S.L.Zhao, L.V.Saxton, A study of Fuzzy Intrusion
Detection, Data Mining, Intrusion Detection, Information
Assuarance, And Data Networks Security, 28 March-1 April
2005, Orlando, Florida, USA.
10.Gomez, J., and D.Dasguptha. Evolving Fuzzy Classifiers
for Intrusion detection .
11. Bobor, V. Efficient Intrusion Detection System
Architecture Based on Neural Networks And Genetic
Algorithms.
12.Li, W., Using Genetic Algorithms For network intrusion
detection.
13. Marczyk, A.Genetic Algorithms and Evolutionary
Computation Techniques, 24 April, 2004.
14.Song, D., A LINEAR GENETIC PROGRAMMING
APPROACH TO INTRUSION DETECTION GECCO 2003.
15.Sinclair, C., L. Pierce, S.Matzner, AN APPLICATION OF
MACHINE LEARNING TONETWORK INTRUSION
DETECTION SYSTEM,
16.Li,W.,Using Genetic Algorithm for Network Intrusion
Detection, Proceedings of the United states Department of
Energy Cyber Security Group 2004 Training Conference, May
24-27,2004,Kanasas City,USA.
17.Gong, R.H., M.Zulkernine, P. Abolmaesumi, A Software
Implementation of a Genetic Algorithm Based Approach to
Network Intrusion Detection System.
18.Gong,R,H,M.Zulkernine ,P.Abolmaesumi ,A software
Implementation of a genetic algorithm based approach to
network Intrusion Detection, proceedings of sixth IEEE
ACCIS international conference on software engineering,
Artificial Intelligence, Networking ,and Parallel/Distributed
Computing ,May 2005,Maryland ,USA.
19.Novel Attack Detection Using Fuzzy Logic and Data Mining
by Norbik Bashah Idris and Bharanidharan Shanmugam.