4 intrusion detection system using fuzzy data mining

Upload: ramana-yellapu

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    1/8

    AN INTRUSION DETECTION SYSTEMUSING FUZZY DATA MINING AND

    GENETIC ALGORITHMSB.SAI PRAVEEN, K.SUBRAHMANYAM

    Sa i_ p r a vee n 1 7 @yah o o . c o m rajaa. 0 32 1 @ ya h o o .co m

    3RD YEARCSE,

    ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES,

    SANGIVALASA, VIZAG.

    ABSTRACT :- Intrusion Detection systems are

    increasingly a key part of systems defense. Various

    approaches to Intrusion Detection are currently being

    used. Artificial intelligence plays a driving role in

    security services. This paper presents a dynamic

    intelligent Intrusion Detection system model, based AI

    approach which includes fuzzy logic and simple data

    mining techniques to process network data. This system

    combines two distinct intrusion approaches: 1)Anomaly

    based intrusion detection system using fuzzy datamining techniques, and 2)Intrusion detection systems

    using genetic algorithms.

    1.

    INTRODUCTION:

    Information has become an

    organizations most precious asset.

    Organizations have become increasingly

    dependent on information, since more

    information is being stored

    and processed on network-based systems.

    Hacking, viruses, worms and trozanhorses are some of the major attacks. A

    significant challenge in providing an

    effective mechanism to a network is the

    ability to detect novel attacks or any

    intrusion works and implement counter

    measures. Intrusion detection is a

    critical component in securing

    information systems. Intrusion detection

    is implemented by an Intrusion

    detection system. Intrusion detection

    system, can detect, prevent and react to

    the attacks. Intrusion detection has

    become the integral part of the

    information security process.

    2.0 INTRUSION DETECTION

    SYSTEMS

    2.1 AN OVERVIEW OF CURRENT

    INTRUSION DETECTION SYSTEMS:

    Intrusion detection is defined [1] as the process of

    intelligently monitoring the events occurring in a

    computer system or network and analyzing them

    for signs of violations of the security policy. The

    primary aim of IDS is to protect the availability,

    confidentiality and integrity of critical networked

    information systems. IDS are defined by both the

    method used to detect attacks and placement of

    the IDS on network. IDS may perform either

    misuse detection or anomaly detection and may be

    deployed as a network based system or host basedsystem. This result in four general groups: misuse-

    host, misuse-network, anomaly host and anomaly

    network. Misuse detection relies on matching

    known patterns of hostile activity against

    databases of past attacks. They are highly

    effective at identifying known attack and

    vulnerabilities, but rather poor in identifying new

    security threats. Anomaly detection will search for

    something rare or unusual by applying statistical

    measures or artificial intelligence methods to

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    2/8

    compare current activity against historic

    knowledge.

    Common problems with anomaly-based systemsare that, they often require extensive training data

    for artificial learning algorithms, and they tend to

    be computationally expensive, because several

    metrics are often maintained, and need to be

    updated against every system activity. Some IDS

    combine qualities from all these categories and are

    known as hybrid systems.

    FIG

    1[19]

    2.2 COMPUTER ATTACK

    CATEGORIES:

    DARPA [2] categorizes the attacks into five major

    types based on goals and actions of the attacker.

    DoS (Denial-of-service ) attacks tries to make

    services provided by or to computer users to be

    restricted or denied. For example, in SYN-flood

    attack, the attacker floods the victim host with

    more TCP connections requests that can handle,

    causing the host to be unable to respond even to

    valid requests. Probe attacks attempts to get

    information about an existing computer or

    networkconfigurations.

    Remote to local (R2L) attacks are caused by anattacker who has only remote access rights. These

    attacks occur when the attacker tries to get local

    access to a computer network.

    User to root(U2R) attacks are performed by an

    attacker who has rights at user level access and

    tries to obtain super user access.

    Probing attacks: In this type of attacks, an

    attacker scans a network of computers to gather

    information of find known vulnerabilities.

    Data attacks are performed to gain access to some

    information to which the attacker is not permitted

    to access. Many R2L and U2L goals are for

    accessing the secret files.

    2.3 IDS DESIGN

    PRINCIPLES:

    IDS are designed and implemented on modelled

    network systems. Several points should be

    predefined and stated, inorder to find proper

    model for network:

    Normal behavior of a network system is

    the most dominant and frequent behavior

    of the network in a certain time period.

    Anomaly within the network system least

    frequent and abnormal behavior of the

    network at certain time period.

    Modelling a dynamic and complex system such as

    the network is very difficult, for this reason ,

    abstraction and partial modelling are used as good

    solution. The whole network components could bedivided into:

    Host

    User

    Networkenvironment

    The user itself could be divided into

    legimate user and malicious user

    (intruder). Many other nested divisions

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    3/8

    could occur according to the designers

    point of view and the areas offocus.

    An Intruder detection system basically raises analarm whenever an anomaly event occurs, which

    could be caused by an intruder to the system.

    These systems do not react equally at all times,

    false alarms could occur sometime and this is

    called False Positive (FP).The lower value of FP

    gives a higher value of the IDS.[3][4]

    2.4 IDS DESIGN

    TRENDS

    There are number of different ways to classify

    IDS in order to distinguish between their different

    types. The most generic classification 1 found for

    IDS is:

    Analysis approach

    Placement ofIDS

    Under each of these categories several

    classifications could occur.[5]

    2.4.1.ANALYSIS

    APPROACH:

    Boer and Pels[6], gave three types of IDS whichcould be listed under this approach:

    NIDS: Network-based IDS which

    monitors the network for malicious

    traffic.

    HIDS: Host-based IDS which monitors

    the activities of a single host.

    DIDS: Distributed IDS correlate events

    from different Host- or Network based

    IDS

    2.4.2.PLACEMENT

    OF IDS:

    In this respect IDS are usually divided into:

    SIDS: Signature-based IDS, which studies

    the attacks patterns and defines a

    signature for it, to enable security

    specialists to design a defense against that

    attack.

    AIDS: Anomaly-based IDS, which learnsthe usual behavior of a network patterns,

    and suspects an attack once an anomaly

    occurs.

    2.5 DATA CAPTURING USINGSNORT:

    Snort is mainly a Network Intrusion Detection

    System (NIDS);it is Open Source and available for

    a variety of unices. Snort also can be used as a

    sniffer to troubleshoot network problems.

    Basically there are three modes in which Snort

    can be configured:

    Sniffer mode simply reads the packets off

    of the network and displays them in a

    continuous stream on the console.

    Packet logger mode logs the packets to

    the disc.

    Network intrusion detection system is the

    most complex and configurable

    configuration, allowing snort to analyze

    network traffic for matches against a user

    defined rule set and performs several

    actions based upon what it sees.

    3.DATAMINING AND FUZZY

    LOGIC

    3.1DATAMINI

    NG

    Data mining methods are used to automatically

    discover new patterns from a large amounts of

    data[7]. Data mining is the automated extraction

    of previously unrealized information from large

    data sources for the purpose of supporting

    actions. The rapid development in data mining

    has made available a wide variety of algorithms,drawn from the field of statistics, pattern

    recognization, machine learning and databases.

    Specifically, data mining approaches have been

    proposed and used for anomaly detection.

    3.1.1.ASSOCIATION

    RULES

    Association rules were first developed to find

    correlations in tractions using real data[8]. For

    example, if a customer who buys a soft drink(A)

    usually also buys potato chips(B), then potato

    chips are associated with soft drinks using the ruleA->B. suppose that 25% of all customers buy both

    A and B and that 50% of the customers who buy

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    4/8

    A also buy B. then the degree of support for the

    rule is s=0.25 and degree of confidence in the rule

    is c=0.50. Agarwal and Srikanth developed the

    fast Apriori algorithm for mining associationrules. The Apriori algorithm requires two

    thresholds of minconfidence and minsupport.

    These two thresholds determine the degree of

    association that must hold before the rule will be

    mined.

    3.2 FUZZY

    LOGIC

    Fuzzy logic was introduced as a means to the

    model of uncertainity of natural language. And

    due to the uncertainity nature of intrusions fuzzy

    sets are strongly used in discovering attack events

    and reducing the rate of false alarms at the same

    time.

    Basically, intrusion detection systems distinguish

    between two distinct types of behaviors, normal

    and abnormal, which creates two distinct sets of

    rules and information. Fuzzy logic could create

    sets that have in-between values where the

    difference between two sets are not well defined.

    In this case the logic depends on linguistics by

    taking the minimum of set of events or maximum

    instead of stating of OR, AND or NOT operation

    in the if then else condition. This feature strongly

    participates in reducing the false positive alarm

    rates in the system[9][10].

    Applying fuzzy methods for the development of

    IDS yield some advantages, compared to the

    classical approach. Therefore, Fuzzy logic

    techniques have been employed in the computer

    security field since in the early 90s. The fuzzy

    logic provides some flexibility to the uncertain

    problem of intrusion detection and allows muchgreater complexity for IDS. Most of the fuzzy IDS

    require human experts to determine the fuzzy sets

    and set of fuzzy rules.

    These tasks are time consuming. However, if the

    fuzzy rules are automatically generated, less time

    would be consumed for building a good intrusion

    classifier and shortens the development time of

    building or updating an intrusion classifier. A

    dynamic fuzzy boundary is developed from

    labelled data for different levels ofsecurity.

    4. ANOMALY DETECTION VIA

    FUZZY DATAMINING

    Fuzzy logic is based on Fuzzy set theory. In

    contrast to standard set theory in which each

    element is either completely In or not in a set .

    fuzzy set theory allows partial membership in

    sets. This provides a powerful mechanism for

    representing vague concepts. Data mining

    methods are used to automatically learn patterns

    from large quantities of data. The integration of

    fuzzy logic with data mining methods will help to

    create more abstract patterns at a higher level

    than at the data level. Patterns that are more

    abstract and less dependent on data will be

    helpful of intrusion detection.

    In the intrusion detection domain , we may want

    to reason about a quantity such as the number of

    different destination IP addresses in the last 2

    seconds. Suppose one wants to write a rule such

    as If the number of different destination

    addresses during the last 2 seconds was high Then

    an unusual situation exists.

    Using traditional logic , one would need to decide

    which values for the number of destination

    addresses fall into the category high. As shown infig 4a. , one would typically divide the range of

    possible values into discrete buckets, each

    representing a different set. The value 10 , for

    example is a member of the set low to the degree

    1 and a member of the other two sets , medium

    and high, to the degree 0. In Fuzzy logic , a

    particular value can have a degree of membership

    between 0 and 1 and can be a member of more

    than one fuzzy set. In fig 4b, for example , the

    value 10 is a member of the set low to the degree

    0.4 and a member of the set medium to the degree

    0.75 . In this example , the membership functions

    for the fuzzy sets are piecewise linear functions.

    Using fuzzy logic terminology , the number of

    destination ports is a fuzzy variable(also called a

    linguistic variable), while the possible values ofthe

    fuzzy variable are the fuzzy sets low, medium,

    and high. In general ,fuzzy variables corresponds

    to nouns and fuzzy sets corresponds to adjectives.

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    5/8

    4a. NON FUZZY SETS

    Using fuzzy logic , a rule like the one shown above

    could be written as

    If the DP=high

    Then an unusual situation exists.

    Where DP is a fuzzy variable and high is a fuzzy

    set. The degree of membership of the number of

    destination ports in the fuzzy set high determines

    whether or not the rule is activated.

    5. IDS USING GENETIC

    ALGORITHMS:

    A genetic Algorithm is a programming

    technique that mimics biological evolution as a

    problem solving strategy[11]. It is based on

    Darwinians principle of evolution and survival of

    fittest to optimize a population of candidate

    solutions towards a predefined fitness[12][13].

    GA uses an evolution and natural selection that

    uses a chromosome like data structure and

    evolve the chromosomes using selection,

    recombination, and mutation operators.

    The process usually begins with randomly

    generated population of chromosomes, which

    represent all possible solution of a problem that

    are considered candidate solutions.

    Different positions of each chromosome are

    encoded as bits, characters or numbers.

    These positions can be referred to as genes. An

    evolution function is used to calculate the

    goodness of each chromosome according to the

    desired solution, this function is known as Fitness

    Function. During evolution, two basic operators,

    crossover and mutation ,are used to simulate the

    natural reproduction and mutation of species. The

    selection of chromosomes for survival and

    combination is biased towards the fittest

    chromosomes[14][15][17].

    The following figure taken from [16] shows the

    structure of a si9mple genetic algorithm. Starting

    by random generation of initial population, then

    evaluate and evolve through selection

    ,recombination ,and mutation. Finally, the best

    individual(chromosome) is picked out as the final

    result once the optimization meet it target.

    Many others and researchers are highly motivated

    to genetic algorithms as a strong and efficient

    method used in different field in Artificial

    Intelligence, noting that several AI techniques

    could be combined in different ways in different

    systems for several purposes.

    The genetic algorithm is employed to drive a set of

    classification rules from network audit data, and

    the support-confidence framework is utilized as

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    6/8

    fitness to judge the quality of each rule. The

    generated rules are the used to detect or classify

    network intrusions in a real-time environment.

    5.1. A GENETIC GA BASED

    INTRUSION DETECTION

    APPROACH:

    As a conclusion of what previously presented of

    AI based IDS, these systems work is divided into

    two main stages. First the training stage which

    provides the system with necessary information

    required initially, after that the next step is the

    detection stage where the system detects

    intrusions according to what was learned in the

    previous step. Applying this in GA based IDS; the

    GA is trained with classification rules learned

    from previous network audit data. The second

    stage is applied in real-time manner by classifying

    the incoming network connections according to

    the generated rules.

    Many systems have been proposed in a lot of

    researchers in either simple or advanced fashion

    ,but to give a general idea of the components of

    the system and basic mechanism of it; the three

    following components will be highlighted:

    1-DataRepresentation

    Genes should be represented in some format using

    different data types such as byte, integer and float.

    Also they may have different data ranges and

    other features, knowing that the genes are

    generated randomly ,in each population

    generating iteration.

    Genetic Algorithms can be used to evolve rules for

    the network traffic; these rules are usually in the

    following form:

    if {condition} then {act}[16].

    It basically contains if-then clause, a condition and

    an act. The conditions usually matches the current

    network behaviour with the one stored in the IDS

    such as comparing an intruder source IP address

    and port number with one already stored in the

    system. The act could be an alarm indicating that

    the intrudes IP and port numbers are related to

    an attacker who is previously known in the

    system.[16][8].

    2-GAParameters

    GA has some common elements and parameters

    which should be defined :

    Fitness Function is defined according

    to[11],The fitness function is defined as a

    function which scales the value individual

    relative to the rest of population. It

    computes the best possible solutions from

    the amount of candidates located in the

    population.

    GA Operators According to the figure

    below we could see that the selection

    mutation and crossover are the most

    effective parts in the algorithm as they

    are they participate in the generation of

    each population.

    Selection is the phase where population

    individuals with better fitness are selected,otherwise it gets damaged.

    Crossover is a process where each pair of

    individuals selects randomly participates

    in exchanging their parents with each

    other, until a total new population has

    been generated.

    Mutation flips some bits in an individual

    ,and since all bits could be filled ,there is

    low probability of predicting the change.

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    7/8

    3-Detection Algorithm overview

    In [8],a genetic algorithm has been presented

    which contains a training process. Thisalgorithm is designed to apply set of

    classification rules according to the input data

    given. It follows the simple flow of genetic

    algorithms presented in the Figure{draw the

    fig: the operation ofGA}

    PROCEDURE: Rule set generation using

    genetic algorithm.

    INPUT: Network audit data, number of

    generations , and population size.

    OUTPUT: A set of classification rules

    PARAMETERS:

    NAD: Network Audit Data

    PS: Population Size

    N: Number of records in training set

    PSEUDO CODE:

    Ruleset (NAD, PS, N)

    {

    W1=0.2, W2=0.8, T=0.5;

    For each chromosome in PS

    Begin 1:

    A=0; AB=0;

    For each record in training set

    Begin 2:

    If(record==chromosome)

    AB=AB+1;

    If(record==condition)

    A=A+1;

    End 2

    Fitness=W1*AB/N+W2*AB/A;

    If(Fitness>T)

    Select chromosome into new population;

    End 1

    For each chromosome in new population

    Begin 3:

    Crossover (chromosome);

    Bmutation (chromosome);

    End 3

    If (Number ofgenerations

  • 8/3/2019 4 Intrusion Detection System Using Fuzzy Data Mining

    8/8

    4.Gorodetsky,V.,I.Kotenko, and O.Karsaev. Multi agent

    technologies for computer network security:Attack simulation,

    Intrusion detection and intrusion detection learning.

    5. Luan Qinglin, Lu Huibin, Research of intrusion detection

    based on neural network optimized by adaptive genetic

    algorithm, Computer Engineering and Design, vol. 29,no. 12,

    pp. 3022-3025, 2008.

    6.De Boer, P., and Martin Pels, Host-Based Intrusion

    Detection System., Technical Report:1.10, Faculty ofScience,

    Informatics Institute, University of Amsterdam, 2005.

    7.Lee, W., S. Stolfo, and K.Mok. 1998. Mining audit data to

    build intrusion detection models.

    8.Agarwal, R., and R. Srikanth. 1994. Fast algorithms for

    mining association rules.

    9.Yao, J.T., S.L.Zhao, L.V.Saxton, A study of Fuzzy Intrusion

    Detection, Data Mining, Intrusion Detection, Information

    Assuarance, And Data Networks Security, 28 March-1 April

    2005, Orlando, Florida, USA.

    10.Gomez, J., and D.Dasguptha. Evolving Fuzzy Classifiers

    for Intrusion detection .

    11. Bobor, V. Efficient Intrusion Detection System

    Architecture Based on Neural Networks And Genetic

    Algorithms.

    12.Li, W., Using Genetic Algorithms For network intrusion

    detection.

    13. Marczyk, A.Genetic Algorithms and Evolutionary

    Computation Techniques, 24 April, 2004.

    14.Song, D., A LINEAR GENETIC PROGRAMMING

    APPROACH TO INTRUSION DETECTION GECCO 2003.

    15.Sinclair, C., L. Pierce, S.Matzner, AN APPLICATION OF

    MACHINE LEARNING TONETWORK INTRUSION

    DETECTION SYSTEM,

    16.Li,W.,Using Genetic Algorithm for Network Intrusion

    Detection, Proceedings of the United states Department of

    Energy Cyber Security Group 2004 Training Conference, May

    24-27,2004,Kanasas City,USA.

    17.Gong, R.H., M.Zulkernine, P. Abolmaesumi, A Software

    Implementation of a Genetic Algorithm Based Approach to

    Network Intrusion Detection System.

    18.Gong,R,H,M.Zulkernine ,P.Abolmaesumi ,A software

    Implementation of a genetic algorithm based approach to

    network Intrusion Detection, proceedings of sixth IEEE

    ACCIS international conference on software engineering,

    Artificial Intelligence, Networking ,and Parallel/Distributed

    Computing ,May 2005,Maryland ,USA.

    19.Novel Attack Detection Using Fuzzy Logic and Data Mining

    by Norbik Bashah Idris and Bharanidharan Shanmugam.