3.pdf

Upload: puneet-thakral

Post on 04-Mar-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

  • A Novel Approach to Intrusion Detection Base on

    Fast Incremental SVM

    Qi Mu, Yongjun Zhang , Qian Niu

    School of Computer, Xian University of Science and Technology Xian, China

    E-mail:[email protected]

    AbstractA new incremental SVM algorithm to intrusion detection based on cloud model is proposed for the low efficiency

    of border vectors extraction. In this algorithm, the characteristic

    distance between the heterogeneous samples is mapped into a

    membership function to extract the boundary vectors from initial

    dataset, which reflects the stability and uncertainty

    characteristics of the cloud model. Also the possible changes of

    support vector set after new samples adding are analyzed and the

    useless samples are discarded by the analysis results. The

    theoretical analysis and simulation results show that the detection

    speed is greatly improved, while maintaining a high detection performance.

    Keywords- Intrusion Detection; Support Vector Machine(SVM);

    Incremental Learning; Cloud Model; Boundary Vectors

    I. INTRODUCTION

    Support vector machine(SVM)[1] is a machine learning algorithm based on statistical theory proposed by Vapnik . It is widely used in the field of intrusion detection because of its great nonlinear processing performance and generalization capability. For intrusion detection systems, however, the too large training set and continuous new samples inevitably lead long training time and affect the classification accuracy of further training. As an effective way to process continuously updated data, incremental learning can retain the previous results , just study the adding data , and form a continuous learning process. Incremental SVM, applied in intrusion detection system , makes full use of the results of historical training to effectively solve the memory problems caused by huge data sets. At the same time, it also can improve the problems of low classification accuracy and long training time, as a result of the new sample emergence.

    Syed [2] first proposed SVM-based incremental learning algorithm, though analyzing support vector (SV) of the sample set. Liu Ye [3] proposed a DoS intrusion detection algorithm based on incremental SVM (ISVM), which only retains the SV and discards all non-SV samples. But with the increase of the new samples, non-SV and SV will transform each other, and discarding non-SV samples too quickly will affect the classification accuracy of the incremental learning. Especially, for the lack of initial samples, subsequent learning may be unstable. Liu Yeqing et al. [4] proposed an Incremental SVM learning algorithm based on the Nearest Border Vectors (referred to as the N-ISVM), which solves the over-discarding

    problem with a small extension set (Nearest Border Vectors) instead of the SV set. However, in practical applications, there may be a fitting problem of classifier trained by the nearest vectors as a result of the fitting samples. Obviously, extraction to the boundary sample as few as possible and preservation to the original classified information as complete as possible are the keys to incremental SVM research.

    On this basis, the extraction of support vector is effectively improved and a new Incremental SVM method based on Cloud Model (referred to as B-ISVM) is proposed after identifying cloud boundary areas[5]. To the initial set, firstly, the distances between each sample and its all heterogeneous are calculated in the feature space and then mapped into membership degree to effectively extract the boundary vectors, according the characteristics of the cloud model, uncertainty with certainty and stability with variability. To incremental set, both the samples violating of the KKT conditions and the ones satisfying but near the classification plane are retained. Combining the two parts and training, the final incremental SVM classifier are completed

    II. CLOUD THEORY

    The cloud model [6-7] , proposed by Professor Li Deyi, is to express the uncertainty transformation between qualitative concept and quantitative data in natural language value, which provide a powerful way to information processing of qualitative and quantitative.

    Definition 1 LetU is a quantitative domain expressed by

    a value, and C is a qualitative concept on U . If the

    quantitative value Ux is a random realization of C and

    certainty degree ]1,0[)( x of x toC is a random number with

    stable tendency : ]1,0[U Ux )(xx the distribution of x in the domain is called cloud, for

    cloud )(XC and each x called a cloud droplet.

    When )(x is normal distribution, it is called as normal

    cloud model [6-7],which is a normal random number set with stable tendency, made up of expected value, entropy and hyper entropy, shown in Figure 1. Algorithm or hardware generating cloud droplets is called the cloud generator.

    1467

    2012 2nd International Conference on Computer Science and Network Technology

    978-1-4673-2964-4/12/$31.00 2012 IEEE CHANGCHUN, CHINA

  • Figure 1. The diagram of three digital characteristics of cloud model

    Algorithm 1 Cloud generator of X-condition[8]

    input { Ex En He } n 0x // three digital characteristics and number of droplets

    output{ ),( 10 x ,, ),( 0 nx }

    for i =1 to n

    ),( HeEnrandnnE

    2

    20

    )(2

    )(

    nE

    Exx

    i e

    ),( 0 ixdrop

    III. IDENTIFYING CLOUD BOUNDARY AREAS

    Support vectors uniquely determine the classification hyperplane, so how to find the support vectors is the key of the SVM classification. Support Vectors have significant geometric features, namely: they are almost in the edge of its class and closest to the separating hyperplane samples. According to this feature, the boundary vectors could be selected from samples which are the most likely to be SVs in order to reduce the training set and improve training efficiency.

    A. Nearest Border Vectors

    Set training set is T={ ),( 11 yx , ... ),( ll yx ,}.

    Let ),( ji xxd be the distance between ix and jx . For each ix ,

    the nearest border vector is the heterogeneous jx corresponding

    the min ),( ji xxd . Traversing all the i , nearest border vectors

    set is obtained.

    Theorem 1 Mapping training set T to high dimensional feature space to make T linearly separable, then hyperplane defined by border vector set of T is unique and can entirely separate the two classes [4].

    Theorem 1 illustrates that the nearest border vectors can replace the full sample set, and also SV set is included in it. However, due to the existence of the noise data and overfitting samples, the penalty parameters will be introduced in order to reduce the deviation of these samples, so actually the SV is not necessarily the nearest point of the classification plane. The

    method replacing the full with the nearest border vectors needs further improvement.

    B. Cloud Boundary Vectors

    SV is certainly in the area close to the heterogeneous, but not necessarily the nearest, which is the same to stable tendency and randomness of cloud model. The theory of cloud is introduced in this paper to extract boundary vectors. The main process is as follows:

    For linearly inseparable training set, data is mapped from

    the original space )(,: xxHRn to a high dimensional

    feature space H by a nonlinear mapping.

    Definition 2 Characteristic Distance Let yx, be two

    vectors. Nonlinear mapping to feature space H, the characteristic distance of them is defined as

    ),(),(2),(

    ))(),((),(

    yyKyxKxxK

    yxdyxd H

    where )(K is the kernel functions satisfying Mercer condition.

    Definition 3 Inter-class Characteristic Distance Matrix

    As in (1), calculate the distance ijD of sample i and each the

    heterogeneous j . Traverse all the two class samples and define

    the CC matrix D as the inter-class characteristic distance

    matrix, where CC , are respectively the numbers of the two

    class samples.

    Let iDmin be the distance between i and its nearest

    heterogeneous sample, and iD be the average distance between

    i and all the heterogeneous samples of the sample with all the heterogeneous samples. So distance of the nearest two vectors

    can be calculated as )min(min ijDD .

    Definition 4 Cloud Membership Transform the

    characteristic distance ijD ,according to Algorithm 1:

    2

    2

    )(2

    )(

    nE

    ExD

    ij

    ij

    e

    ),(' HeEnnormrndEn

    ij is defined as the cloud membership of i to j , where

    Ex =iDmin En = minDDi cEnHe / . 2Nc in this

    paper as a control parameter, and N is the number of

    homogeneous samples. nE is a normal distribution random

    number with expectation En and variance He . Therefore, the

    smaller ijD does not necessarily correspond larger ij ,

    reflecting the uncertainty.

    1468

  • Definition 5 Cloud Boundary Vectors and Cloud

    Boundary Areas For each sample j , find the heterogeneous

    sample i corresponding the max( ij ),which become a cloud

    boundary vector. All the cloud boundary vectors consist of cloud boundary areas.

    C. Parameters and Performance Analysis

    The nearest vectors only have the greater probability to be boundary vectors. On the other hand, the ones far from the classification hyperplane may contain useful information and are also added to cloud boundary areas, although they are not the SVs during this learning. The cloud boundary vectors are extracted according the characteristic distance to maintain the stability tendency of the samples, retaining samples close to the heterogeneous and eliminating others. However, it is uncertain for each sample: the nearest vector will not necessarily be selected and the farthest may be retained.

    To extract the boundary vectors based on cloud model and meet these requirements, the parameter plays a vital role. The horizontal position and the steepness of the cloud model are affected by Ex and En . He is related to the dispersion of the cloud droplets, that is, the greater He is, the bigger dispersion of droplets is. All the cloud droplets randomly fluctuate the

    curve of the desired, which is controlled by He .

    First, let Ex =iDmin in order to ensure the vectors near

    classification surface vectors tend to larger membership degree.

    Set En= minDDi to control the cloud coverage and the scope

    of the cloud will also increase accordingly with the sample distance. The stable tendency will be damaged because of too large He ,and randomness may lost because it is too small. When He=0, the algorithm degenerated into nearest boundary

    vector algorithm. In this paper, the parameter 2Nc

    controlled by the number of samples, make Hedecreased as the samples increased in order to avoid retaining too many samples in the long distance.

    g(x)=0

    Noise Samples

    Overfitting Samples

    Cloud Boundary Vectors

    Figure 2. Extraction of Cloud Boundary Vectiors

    Through the cloud boundary vectors marked with solid points, the training set is reduced as shown in Figure 2, where the two types of samples are respectively discribed as rectangles and triangles. There are no clear boundaries of the final cloud boundary vector districts, keepping pace with the fuzziness of the cloud model. As a result, the full sample are

    effectively reduced and the distribution characteristics of the sample itself is greatly to keep.

    IV. KKT CONDITIONS AND SAMPLE DISTRIBUTIONS

    If the new samples contain the classified information that original sample set does not, then the SV set is bound to change after learing, because of the new information. The impact of adding samples on the original SVs are related to the KKT conditions. The new samples satisfing KKT conditions will not change the original support vector set, otherwise the opposite.

    Theorem 2 If the one violating KKT conditions exists in the adding samples, the non-SV of original samples may become a new SV[9].

    The impact of the new sample on the original non-SVs is as an example of Figure 3. As shown in Figure 3, rectangles and triangles respectively represent two different types of samples, and new samples are marked with solid points. Initial

    classification plane is 0)( xf and new classification plane

    is 0)( xg , respectively. N1 was non-SV in the initial set.

    However, after adding new samples, it converte into SV.

    In simple incremental SVM method, only the original SV set and the violation samples of the KKT conditions are merged into the new training set, which may result in the loss of classified information of the original sample, regarding of the original non-SVs. According the theorem 2, these ignored non-SV may become SV in the follow-up learning. Therefore, the successor to the learning process will appear the "shock" phenomenon, because of the deficiency of initial samples.

    g(x)=0

    f(x)=0

    A1

    N1

    Figure 3. Impact of adding samples to classification

    Theorem 3 If the one violating KKT conditions exists in the adding samples, the sample satisfing KKT may become a new SV[9].

    In Figure 3, A1 satisfies the KKT conditions and transforms into the SV after new samples adding, where new samples are marked with solid points.

    Therefore, new samples satisfing KKT conditions also shoud not be thrown away. In incremental learning, in addition to the violating samples of the KKT conditions, a part of new ones satisfing KKT conditions shoud be considered, which distribute outside the class interval within a certain range.

    1469

  • V. INCREMENTAL SVM METHOD TO INTRUSION DETECTION BAESD ON CLOUD BOUNDARY VECTORS

    To the initial set, firstly, the distances between each sample and its all heterogeneous are calculated in the feature space and then mapped into membership degree to effectively extract the boundary vectors, which are trained to get initial classifier. To incremental set, both the samples

    violating of the KKT conditions expressed by 1)( ii xfy and

    the ones satisfying but near the classification plane expressed

    by 1)(1 ii xfy )1,0( are retained. Combining all the three parts and training, the final incremental SVM classifier are completed.

    Let 0X be the initial samples, and IX be the increamental

    samples of the I time. The algorithm is as follow.

    (1)Define the cloud boundary vectors 0C of 0X ,and train to

    complete initial classifier SVM ;

    (2)If IX , Algorithm terminateselse, turn step(3);

    (3)If there is no the sample violating the KKT conditions

    in IX , turn (2) ;

    (4)Define NX as the sample violating the KKT conditions

    in IX , and VX as the ones satisfying but near the classification

    plane;

    (5) VN XXCC 00 train 0C and update SVM , turn(2).

    VI. EXPERIMENT AND ANALYSIS OF INTRUSION DETECTION

    12000 independent samples in KDDCUP1999[10] are randomly selected and distributed into four parts, where one is initial set and the remaining three are incremental sets.

    Experiment environment is CPU 1. 99GHzmemory 2GB and Windows XP system,based on MATLAB7.0 and libsvm-mat-2.91-1 toolbox, where RBF function is as kernel function of SVM and kernel parameter g and penalty coefficient C can be obtained by cross-validation optimizing method. The experimental results are shown in Table I.

    TABLE I. EXPERIMENTAL RESULTS

    Training sets samples I-SVM N-ISVM C-ISVM

    Training time(ms) rate(%) Training time(ms) Training Training time(ms) rate(%)

    Initial set 2999 2999 78 100 2999 78 100 2999 78 100

    Incremental set 1 3001 3178 203 94.87 1989 125 93.47 1695 125 93.30

    Incremental set 2 2999 3205 422 84.97 2124 313 90.62 1721 310 94.71

    Incremental set 2 3001 3247 671 80.81 2200 547 92.50 1798 469 96.29

    Table 1 shows that C-ISVM algorithm is superior to I-SVM and N-ISVM method in terms of overall performance. In the aspect of training samples and time, the I-SVM trains all incremental samples, resulting the largest training number and time. Such as the first increment, training time is 38.42% more than others. Although C-ISVM expand the boundary vectors, the samples violating the KKT conditions are reduced. So the training set and time is slightly smaller than the N-ISVM; In the aspect of detection rate, the final rate of C-ISVM is significantly increased by 15.48% and 3.79% than the N-ISVM and I-SVM. In the aspect of algorithm stability, the detection rate of I-SVM is declining as the increment increasing. N-ISVM maintains higher detection rate, but the impact of the noise data and the sample over-fitting limites its classification performance further improving. C-ISVM focuses on filtering the boundary vectors, and also retains the overall distribution characteristics of samples. As the samples gradually improved in follow-up learning process, the detection rate can maintain a steady rising trend.

    VII. CONCLUSION

    A new incremental SVM method to intrusion detection based on cloud model is proposed. The cloud membership is defined to replace characteristic distance,and also KKT conditions are extended. Experimental results show that the method effectively reduces the sample set and running time, while maintaining a high detection performance.

    REFERENCES

    [1] Cortes C, Vapnik V. Support vector networks [J]. Machine Learning. 1995, 20 (3):273-297.

    [2] Syed N, Liu H, Sung K. Incremental learning with support vector machines[C]. Proc Int Joint Conf on Artificial Intelligence, 1999.

    [3] LIU Ye, WANG Zebing, FENG Yan. DoS Intrusion Detection Based on Incremental Learning with Support Vector Machines [J]. Computer Engineering,2006,32(4):179-186.

    [4] LIU Ye-qing, LIU Sanyang, GU Ming-tao. Incremental Learning Algorithm of Support Vector Machine Based on Nearest Border

    Vectors[J]. Mathematics in Practice and Theory,2011,41(2):110-114.

    [5] CHEN Weimin. Research on Support Vector Machine Solving the Large-scale Data Set [D]. Nanjing University of Aeronautics and

    Astronautics,2006.

    [6] Li Deyi, Meng Haijun, Shi Xuemei. Membership Clouds and Membership Cloud Generators[J]. Journal of Computer Research and

    Development. 1995,32(6):15-20.

    [7] Li Deyi, Liu Changyu, Du Yi,et al. Artificial Intelligence with Uncertainty [J]. Journal of Software,2004,15(11):1583-1594.

    [8] Li xing-sheng. Study on Classification and Clustering Mining based on Cloud Model and Data Field[D].The PLA University of Technology & Science ,2003.16-19.

    [9] WANG Xiao-dan, ZHENG Chun-ying, WU Chong-ming, et.al. New algorithm for SVM-Based incremental learning[J]. Journal of Computer Applications,2006,26(10):2440-2443.

    [10] KDD99Cupdataset[DB/OL].[2012-08-07].http://kdd.ics.uci.edu /databases/kddcup99/kddcup99.Html

    1470