3.pdf
TRANSCRIPT
-
A Novel Approach to Intrusion Detection Base on
Fast Incremental SVM
Qi Mu, Yongjun Zhang , Qian Niu
School of Computer, Xian University of Science and Technology Xian, China
E-mail:[email protected]
AbstractA new incremental SVM algorithm to intrusion detection based on cloud model is proposed for the low efficiency
of border vectors extraction. In this algorithm, the characteristic
distance between the heterogeneous samples is mapped into a
membership function to extract the boundary vectors from initial
dataset, which reflects the stability and uncertainty
characteristics of the cloud model. Also the possible changes of
support vector set after new samples adding are analyzed and the
useless samples are discarded by the analysis results. The
theoretical analysis and simulation results show that the detection
speed is greatly improved, while maintaining a high detection performance.
Keywords- Intrusion Detection; Support Vector Machine(SVM);
Incremental Learning; Cloud Model; Boundary Vectors
I. INTRODUCTION
Support vector machine(SVM)[1] is a machine learning algorithm based on statistical theory proposed by Vapnik . It is widely used in the field of intrusion detection because of its great nonlinear processing performance and generalization capability. For intrusion detection systems, however, the too large training set and continuous new samples inevitably lead long training time and affect the classification accuracy of further training. As an effective way to process continuously updated data, incremental learning can retain the previous results , just study the adding data , and form a continuous learning process. Incremental SVM, applied in intrusion detection system , makes full use of the results of historical training to effectively solve the memory problems caused by huge data sets. At the same time, it also can improve the problems of low classification accuracy and long training time, as a result of the new sample emergence.
Syed [2] first proposed SVM-based incremental learning algorithm, though analyzing support vector (SV) of the sample set. Liu Ye [3] proposed a DoS intrusion detection algorithm based on incremental SVM (ISVM), which only retains the SV and discards all non-SV samples. But with the increase of the new samples, non-SV and SV will transform each other, and discarding non-SV samples too quickly will affect the classification accuracy of the incremental learning. Especially, for the lack of initial samples, subsequent learning may be unstable. Liu Yeqing et al. [4] proposed an Incremental SVM learning algorithm based on the Nearest Border Vectors (referred to as the N-ISVM), which solves the over-discarding
problem with a small extension set (Nearest Border Vectors) instead of the SV set. However, in practical applications, there may be a fitting problem of classifier trained by the nearest vectors as a result of the fitting samples. Obviously, extraction to the boundary sample as few as possible and preservation to the original classified information as complete as possible are the keys to incremental SVM research.
On this basis, the extraction of support vector is effectively improved and a new Incremental SVM method based on Cloud Model (referred to as B-ISVM) is proposed after identifying cloud boundary areas[5]. To the initial set, firstly, the distances between each sample and its all heterogeneous are calculated in the feature space and then mapped into membership degree to effectively extract the boundary vectors, according the characteristics of the cloud model, uncertainty with certainty and stability with variability. To incremental set, both the samples violating of the KKT conditions and the ones satisfying but near the classification plane are retained. Combining the two parts and training, the final incremental SVM classifier are completed
II. CLOUD THEORY
The cloud model [6-7] , proposed by Professor Li Deyi, is to express the uncertainty transformation between qualitative concept and quantitative data in natural language value, which provide a powerful way to information processing of qualitative and quantitative.
Definition 1 LetU is a quantitative domain expressed by
a value, and C is a qualitative concept on U . If the
quantitative value Ux is a random realization of C and
certainty degree ]1,0[)( x of x toC is a random number with
stable tendency : ]1,0[U Ux )(xx the distribution of x in the domain is called cloud, for
cloud )(XC and each x called a cloud droplet.
When )(x is normal distribution, it is called as normal
cloud model [6-7],which is a normal random number set with stable tendency, made up of expected value, entropy and hyper entropy, shown in Figure 1. Algorithm or hardware generating cloud droplets is called the cloud generator.
1467
2012 2nd International Conference on Computer Science and Network Technology
978-1-4673-2964-4/12/$31.00 2012 IEEE CHANGCHUN, CHINA
-
Figure 1. The diagram of three digital characteristics of cloud model
Algorithm 1 Cloud generator of X-condition[8]
input { Ex En He } n 0x // three digital characteristics and number of droplets
output{ ),( 10 x ,, ),( 0 nx }
for i =1 to n
),( HeEnrandnnE
2
20
)(2
)(
nE
Exx
i e
),( 0 ixdrop
III. IDENTIFYING CLOUD BOUNDARY AREAS
Support vectors uniquely determine the classification hyperplane, so how to find the support vectors is the key of the SVM classification. Support Vectors have significant geometric features, namely: they are almost in the edge of its class and closest to the separating hyperplane samples. According to this feature, the boundary vectors could be selected from samples which are the most likely to be SVs in order to reduce the training set and improve training efficiency.
A. Nearest Border Vectors
Set training set is T={ ),( 11 yx , ... ),( ll yx ,}.
Let ),( ji xxd be the distance between ix and jx . For each ix ,
the nearest border vector is the heterogeneous jx corresponding
the min ),( ji xxd . Traversing all the i , nearest border vectors
set is obtained.
Theorem 1 Mapping training set T to high dimensional feature space to make T linearly separable, then hyperplane defined by border vector set of T is unique and can entirely separate the two classes [4].
Theorem 1 illustrates that the nearest border vectors can replace the full sample set, and also SV set is included in it. However, due to the existence of the noise data and overfitting samples, the penalty parameters will be introduced in order to reduce the deviation of these samples, so actually the SV is not necessarily the nearest point of the classification plane. The
method replacing the full with the nearest border vectors needs further improvement.
B. Cloud Boundary Vectors
SV is certainly in the area close to the heterogeneous, but not necessarily the nearest, which is the same to stable tendency and randomness of cloud model. The theory of cloud is introduced in this paper to extract boundary vectors. The main process is as follows:
For linearly inseparable training set, data is mapped from
the original space )(,: xxHRn to a high dimensional
feature space H by a nonlinear mapping.
Definition 2 Characteristic Distance Let yx, be two
vectors. Nonlinear mapping to feature space H, the characteristic distance of them is defined as
),(),(2),(
))(),((),(
yyKyxKxxK
yxdyxd H
where )(K is the kernel functions satisfying Mercer condition.
Definition 3 Inter-class Characteristic Distance Matrix
As in (1), calculate the distance ijD of sample i and each the
heterogeneous j . Traverse all the two class samples and define
the CC matrix D as the inter-class characteristic distance
matrix, where CC , are respectively the numbers of the two
class samples.
Let iDmin be the distance between i and its nearest
heterogeneous sample, and iD be the average distance between
i and all the heterogeneous samples of the sample with all the heterogeneous samples. So distance of the nearest two vectors
can be calculated as )min(min ijDD .
Definition 4 Cloud Membership Transform the
characteristic distance ijD ,according to Algorithm 1:
2
2
)(2
)(
nE
ExD
ij
ij
e
),(' HeEnnormrndEn
ij is defined as the cloud membership of i to j , where
Ex =iDmin En = minDDi cEnHe / . 2Nc in this
paper as a control parameter, and N is the number of
homogeneous samples. nE is a normal distribution random
number with expectation En and variance He . Therefore, the
smaller ijD does not necessarily correspond larger ij ,
reflecting the uncertainty.
1468
-
Definition 5 Cloud Boundary Vectors and Cloud
Boundary Areas For each sample j , find the heterogeneous
sample i corresponding the max( ij ),which become a cloud
boundary vector. All the cloud boundary vectors consist of cloud boundary areas.
C. Parameters and Performance Analysis
The nearest vectors only have the greater probability to be boundary vectors. On the other hand, the ones far from the classification hyperplane may contain useful information and are also added to cloud boundary areas, although they are not the SVs during this learning. The cloud boundary vectors are extracted according the characteristic distance to maintain the stability tendency of the samples, retaining samples close to the heterogeneous and eliminating others. However, it is uncertain for each sample: the nearest vector will not necessarily be selected and the farthest may be retained.
To extract the boundary vectors based on cloud model and meet these requirements, the parameter plays a vital role. The horizontal position and the steepness of the cloud model are affected by Ex and En . He is related to the dispersion of the cloud droplets, that is, the greater He is, the bigger dispersion of droplets is. All the cloud droplets randomly fluctuate the
curve of the desired, which is controlled by He .
First, let Ex =iDmin in order to ensure the vectors near
classification surface vectors tend to larger membership degree.
Set En= minDDi to control the cloud coverage and the scope
of the cloud will also increase accordingly with the sample distance. The stable tendency will be damaged because of too large He ,and randomness may lost because it is too small. When He=0, the algorithm degenerated into nearest boundary
vector algorithm. In this paper, the parameter 2Nc
controlled by the number of samples, make Hedecreased as the samples increased in order to avoid retaining too many samples in the long distance.
g(x)=0
Noise Samples
Overfitting Samples
Cloud Boundary Vectors
Figure 2. Extraction of Cloud Boundary Vectiors
Through the cloud boundary vectors marked with solid points, the training set is reduced as shown in Figure 2, where the two types of samples are respectively discribed as rectangles and triangles. There are no clear boundaries of the final cloud boundary vector districts, keepping pace with the fuzziness of the cloud model. As a result, the full sample are
effectively reduced and the distribution characteristics of the sample itself is greatly to keep.
IV. KKT CONDITIONS AND SAMPLE DISTRIBUTIONS
If the new samples contain the classified information that original sample set does not, then the SV set is bound to change after learing, because of the new information. The impact of adding samples on the original SVs are related to the KKT conditions. The new samples satisfing KKT conditions will not change the original support vector set, otherwise the opposite.
Theorem 2 If the one violating KKT conditions exists in the adding samples, the non-SV of original samples may become a new SV[9].
The impact of the new sample on the original non-SVs is as an example of Figure 3. As shown in Figure 3, rectangles and triangles respectively represent two different types of samples, and new samples are marked with solid points. Initial
classification plane is 0)( xf and new classification plane
is 0)( xg , respectively. N1 was non-SV in the initial set.
However, after adding new samples, it converte into SV.
In simple incremental SVM method, only the original SV set and the violation samples of the KKT conditions are merged into the new training set, which may result in the loss of classified information of the original sample, regarding of the original non-SVs. According the theorem 2, these ignored non-SV may become SV in the follow-up learning. Therefore, the successor to the learning process will appear the "shock" phenomenon, because of the deficiency of initial samples.
g(x)=0
f(x)=0
A1
N1
Figure 3. Impact of adding samples to classification
Theorem 3 If the one violating KKT conditions exists in the adding samples, the sample satisfing KKT may become a new SV[9].
In Figure 3, A1 satisfies the KKT conditions and transforms into the SV after new samples adding, where new samples are marked with solid points.
Therefore, new samples satisfing KKT conditions also shoud not be thrown away. In incremental learning, in addition to the violating samples of the KKT conditions, a part of new ones satisfing KKT conditions shoud be considered, which distribute outside the class interval within a certain range.
1469
-
V. INCREMENTAL SVM METHOD TO INTRUSION DETECTION BAESD ON CLOUD BOUNDARY VECTORS
To the initial set, firstly, the distances between each sample and its all heterogeneous are calculated in the feature space and then mapped into membership degree to effectively extract the boundary vectors, which are trained to get initial classifier. To incremental set, both the samples
violating of the KKT conditions expressed by 1)( ii xfy and
the ones satisfying but near the classification plane expressed
by 1)(1 ii xfy )1,0( are retained. Combining all the three parts and training, the final incremental SVM classifier are completed.
Let 0X be the initial samples, and IX be the increamental
samples of the I time. The algorithm is as follow.
(1)Define the cloud boundary vectors 0C of 0X ,and train to
complete initial classifier SVM ;
(2)If IX , Algorithm terminateselse, turn step(3);
(3)If there is no the sample violating the KKT conditions
in IX , turn (2) ;
(4)Define NX as the sample violating the KKT conditions
in IX , and VX as the ones satisfying but near the classification
plane;
(5) VN XXCC 00 train 0C and update SVM , turn(2).
VI. EXPERIMENT AND ANALYSIS OF INTRUSION DETECTION
12000 independent samples in KDDCUP1999[10] are randomly selected and distributed into four parts, where one is initial set and the remaining three are incremental sets.
Experiment environment is CPU 1. 99GHzmemory 2GB and Windows XP system,based on MATLAB7.0 and libsvm-mat-2.91-1 toolbox, where RBF function is as kernel function of SVM and kernel parameter g and penalty coefficient C can be obtained by cross-validation optimizing method. The experimental results are shown in Table I.
TABLE I. EXPERIMENTAL RESULTS
Training sets samples I-SVM N-ISVM C-ISVM
Training time(ms) rate(%) Training time(ms) Training Training time(ms) rate(%)
Initial set 2999 2999 78 100 2999 78 100 2999 78 100
Incremental set 1 3001 3178 203 94.87 1989 125 93.47 1695 125 93.30
Incremental set 2 2999 3205 422 84.97 2124 313 90.62 1721 310 94.71
Incremental set 2 3001 3247 671 80.81 2200 547 92.50 1798 469 96.29
Table 1 shows that C-ISVM algorithm is superior to I-SVM and N-ISVM method in terms of overall performance. In the aspect of training samples and time, the I-SVM trains all incremental samples, resulting the largest training number and time. Such as the first increment, training time is 38.42% more than others. Although C-ISVM expand the boundary vectors, the samples violating the KKT conditions are reduced. So the training set and time is slightly smaller than the N-ISVM; In the aspect of detection rate, the final rate of C-ISVM is significantly increased by 15.48% and 3.79% than the N-ISVM and I-SVM. In the aspect of algorithm stability, the detection rate of I-SVM is declining as the increment increasing. N-ISVM maintains higher detection rate, but the impact of the noise data and the sample over-fitting limites its classification performance further improving. C-ISVM focuses on filtering the boundary vectors, and also retains the overall distribution characteristics of samples. As the samples gradually improved in follow-up learning process, the detection rate can maintain a steady rising trend.
VII. CONCLUSION
A new incremental SVM method to intrusion detection based on cloud model is proposed. The cloud membership is defined to replace characteristic distance,and also KKT conditions are extended. Experimental results show that the method effectively reduces the sample set and running time, while maintaining a high detection performance.
REFERENCES
[1] Cortes C, Vapnik V. Support vector networks [J]. Machine Learning. 1995, 20 (3):273-297.
[2] Syed N, Liu H, Sung K. Incremental learning with support vector machines[C]. Proc Int Joint Conf on Artificial Intelligence, 1999.
[3] LIU Ye, WANG Zebing, FENG Yan. DoS Intrusion Detection Based on Incremental Learning with Support Vector Machines [J]. Computer Engineering,2006,32(4):179-186.
[4] LIU Ye-qing, LIU Sanyang, GU Ming-tao. Incremental Learning Algorithm of Support Vector Machine Based on Nearest Border
Vectors[J]. Mathematics in Practice and Theory,2011,41(2):110-114.
[5] CHEN Weimin. Research on Support Vector Machine Solving the Large-scale Data Set [D]. Nanjing University of Aeronautics and
Astronautics,2006.
[6] Li Deyi, Meng Haijun, Shi Xuemei. Membership Clouds and Membership Cloud Generators[J]. Journal of Computer Research and
Development. 1995,32(6):15-20.
[7] Li Deyi, Liu Changyu, Du Yi,et al. Artificial Intelligence with Uncertainty [J]. Journal of Software,2004,15(11):1583-1594.
[8] Li xing-sheng. Study on Classification and Clustering Mining based on Cloud Model and Data Field[D].The PLA University of Technology & Science ,2003.16-19.
[9] WANG Xiao-dan, ZHENG Chun-ying, WU Chong-ming, et.al. New algorithm for SVM-Based incremental learning[J]. Journal of Computer Applications,2006,26(10):2440-2443.
[10] KDD99Cupdataset[DB/OL].[2012-08-07].http://kdd.ics.uci.edu /databases/kddcup99/kddcup99.Html
1470