![Page 1: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/1.jpg)
ORIGINAL ARTICLE
Enhancing SVM performance in intrusion detection using optimalfeature subset selection based on genetic principal components
Iftikhar Ahmad • Muhammad Hussain •
Abdullah Alghamdi • Abdulhameed Alelaiwi
Received: 18 September 2012 / Accepted: 25 February 2013
� Springer-Verlag London 2013
Abstract Intrusion detection is very serious issue in these
days because the prevention of intrusions depends on
detection. Therefore, accurate detection of intrusion is very
essential to secure information in computer and network
systems of any organization such as private, public, and
government. Several intrusion detection approaches are
available but the main problem is their performance, which
can be enhanced by increasing the detection rates and
reducing false positives. This issue of the existing tech-
niques is the focus of research in this paper. The poor
performance of such techniques is due to raw dataset which
confuse the classifier and results inaccurate detection due
to redundant features. The recent approaches used principal
component analysis (PCA) for feature subset selection
which is based on highest eigenvalues, but the features
corresponding to the highest eigenvalues may not have the
optimal sensitivity for the classifier due to ignoring many
sensitive features. Instead of using traditional approach of
selecting features with the highest eigenvalues such as
PCA, this research applied a genetic algorithm to search
the genetic principal components that offers a subset of
features with optimal sensitivity and the highest discrimi-
natory power. The support vector machine (SVM) is used
for classification purpose. This research work used the
knowledge discovery and data mining cup dataset for
experimentation. The performance of this approach was
analyzed and compared with existing approaches. The
results show that proposed method enhances SVM perfor-
mance in intrusion detection that outperforms the existing
approaches and has the capability to minimize the number
of features and maximize the detection rates.
Keywords Intrusion detection system (IDS) � Support
vector machines (SVMs) � Principal component analysis
(PCA) � Genetic algorithm (GA) � Genetic principal
component (GPC) � Detection rate (DR) and dataset
1 Introduction
Currently, intrusions on network systems are key security
threats. Therefore, it is significant to stop such intrusions. In
order to prevent such intrusions, their detection is very
necessary. Further, detection is a key part of any security
tool such as intrusion detection system (IDS), intrusion
prevention system (IPS), adaptive security alliance (ASA),
checkpoints, and firewalls [1]. Therefore, accurate detection
of network attack is necessary. Several intrusion detection
techniques are offered but the leading problem is their
performance, which can be improved by increasing the
detection rates and reducing false positives. Such draw-
backs of the past techniques have motivated this research.
One of the drawbacks of the past intrusion detection
approaches is the usage of a raw dataset for classification
but the classifier may get confused due to redundancy and
hence may not classify correctly. To overcome this issue,
principal component analysis (PCA) has been applied to
transform raw features into principal features space and
select the features based on their sensitivity. The sensitivity
is determined by the values of eigenvalues [2].
The modern methods use PCA to project features space
to principal feature space and select features corresponding
to the highest eigenvalues, but the features corresponding
to the highest eigenvalues may not have the optimal
I. Ahmad (&) � M. Hussain � A. Alghamdi � A. Alelaiwi
Department of Software Engineering, College of Computer
and Information Sciences, King Saud University,
Riyadh, Saudi Arabia
e-mail: [email protected]
123
Neural Comput & Applic
DOI 10.1007/s00521-013-1370-6
![Page 2: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/2.jpg)
sensitivity for the classifier due to ignoring many sensitive
features [3]. Instead of using traditional approach of
selecting features with the highest eigenvalues such as
PCA, this research applied a genetic algorithm (GA) to
search the principal feature space that offers a subset of
features with optimal sensitivity and the highest discrimi-
natory power. Based on the selected features, the classifi-
cation is performed. The support vector machine (SVM) is
used for classification purpose due to their proven ability in
classification. This research work uses the knowledge dis-
covery and data mining (KDD) cup dataset, which is
considered benchmark for evaluating security detection
mechanisms.
The focus of this work is searching the PCA space using
GA to select a subset of principal components called
genetic principal components. This is a novel method in
intrusion detection when compared to traditional methods
in which some percentage of the top principal components
is selected. This method is applied and tested on intrusion
detection which demonstrated the performance enhance-
ment in the SVM classifier or intrusive analysis engine.
The rest of the paper is structured as follows. The related
work is described in Sect. 2. The proposed model is pre-
sented in Sect. 3. Some basic knowledge of PCA is dis-
cussed in Sect. 4, which is applied on the proposed model.
In Sect. 5, details are given for GA for selecting genetic
principal components from the principal space. The clas-
sification process using SVM is explained in Sect. 6. The
applied methodology is described in Sect. 7. Experimental
results are discussed in Sect. 8. Finally, conclusions are
drawn in Sect. 9.
2 Related work
The focus of past work was on of feature extraction and
classification in intrusion detection, and less importance is
given to the critical issue of feature selection. Feature
selection is an important phase before the classification
process. Because the performance of the classifier depends
on optimal subset of features, this issue is ignored in the
area of intrusion detection in existing approaches. Such
approaches mostly rely on powerful classification algo-
rithms to deal with redundant and irrelevant features [3, 4].
In [5], PCA is used for feature selection and neural
networks are used for classification. This work used first 22
feature set from 38 feature set. The principal components
are selected based on highest eigenvalues in a traditional
way which causes the possibility of missing many impor-
tant features. Such features may have the more discrimi-
natory power when compared to selected features. So there
must be a mechanism to the selection of optimal subset of
features in the principal space.
In [6], the significant feature is decided on the basis of
the accuracy and the number of false alarms of the clas-
sifier, with and without the feature. The feature selection is
based on ‘‘leave-one-out’’; eliminate one feature from the
original dataset, repeat the experiment, then match the new
results with the original result, if any case of the described
cases occurs. The feature is considered as important;
otherwise, it is observed as insignificant. Since there are 41
features in the KDD cup, so, the experiment is repeated 41
times to confirm that each feature is either significant or
insignificant. This technique is complicated as well as had
overheads on massive dataset.
In [7], the radial basis function (RBF) network is used as
a classifier, and the Elman network is applied to reinstate
the memory of older actions. This work used full featured
KDD cup dataset which consists of 41 features. This
approach is not good in classification and introduced
overheads. The raw feature set confused the classifier due
to redundancy and results false alarms. Further, it increases
training and testing overheads, reduces accurate detection
rate, consumes more memory and computational resources,
increases architectural complexity and malfunction of the
system.
In [8], PCA is used to decide an optimal feature set.
Such feature set improved the performance of the classifier
and reduces training as well as testing overheads in the
IDSs. But here is an important issue of performance of
intrusive analysis engine based on feature subset selection
method. This method will be a compromise between
training efficiency and the accurate results. Because few
PCA components increase training efficiency but may
cause false alarms, whereas a large number of PCA com-
ponents increase training overheads and complexity.
In [9], the fusions of GA and SVM are described for
features selection and parameters tuning of the system.
This method was capable to minimize the amounts of
features and maximize the detection rates but the problem
is features uniformity. The features in original forms are
not consistent so such features must be transformed in new
feature space in order to well make them more visible and
well organized.
In [3], intrusion detection mechanism was proposed
using soft computing techniques: SVM, GA, and PCA. The
system is implemented and tested on two cases: (1) PCA,
GA, and SVM, (2) PCA and SVM. The focus was com-
paring SVM performance on feature sets: (1) 12 features
obtained by PCA and GA, (2) 22 features directly taken
from the PCA output using traditional method. This
approach is required further work to testify and validate it
with detailed experimentation.
In [4], an initial effort is made on feature subset selec-
tion in intrusion detection. The presented technique was not
properly explained and demonstrated to verify it. This
Neural Comput & Applic
123
![Page 3: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/3.jpg)
method conducted three experiments, MLP as classifier
with 12, 20, and 27 feature set. There are a number of
issues in this work such as there is possibility that classifier
can perform well on original dataset (raw dataset), trans-
formed dataset (PCA), and PCA dataset obtained conven-
tionally. This work was not enough to testify the proposed
approach. However, this approach is required experimen-
tation to verify it.
The feature selection is an important problem in intru-
sion detection because performance of the classifier or
intrusive analysis engine depends on it. The more accurate
dataset results the more accuracy and performance in
intrusion detection. The GAs provide a simple, general, and
powerful framework for selecting good subsets of features,
leading to improved detection rates [3, 4, 10].
3 Proposed model
The proposed model is shown in Fig. 1. This model has six
parts. First part is the selection of dataset for experimen-
tation. Dataset can be formed by different ways but this
work used KDD cup which is standard and benchmark in
the area of intrusion detection and security evaluation
frameworks. Second part is feature transformation which is
very important to make features more visible, organize, and
discriminant in the principal space. The PCA is used for
transformation and to overcome the issue of redundancy.
Third part is the feature subset selection which is different
from previous methods. For this purpose, GA is applied to
search the PCA space to select a subset of principal com-
ponents. These principal components are known as genetic
principal components in this work. Fourth part is the
classification which is core of the intrusion detection
mechanism. The SVM is used as classifier due to its proven
ability in classification problems. Further, it is a good
solution in two class problem, and it is more suitable in this
work of two classes: normal and intrusive. Fifth part is the
training and testing of the system. Training is tuning the
system to find optimal parameters, and testing is the
evaluation of the trained system. This work is extension of
previous work [3]. This model is further explained in next
sections of the paper as well as in methodology section.
4 Principal component analysis
PCA is one of the common techniques used for dimen-
sionality reduction. We use PCA to remove the redundancy
present in the features and compute a compact represen-
tation which makes the features more visible and organize
in the new space called the principal space. It is a valuable
statistical method that several application in fields such as
face recognition and image compression and is a common
technique for finding patterns in data of high dimension.
The goal of PCA is to reduce the dimensionality of the data
while retaining as much as possible of the variation present
in the original dataset. It is a way of identifying patterns in
data and expressing the data in such a way as to highlight
their similarities and differences [1–3, 9]. The whole sub-
ject of statistics is based on the idea that you have a big set
of data, and you want to analyze it in terms of the rela-
tionships between the individual points in that set [3, 7].
So, we have a feature set and the goal is to visualize it in
order to determine the principal components. The electing
of principal components is carried out through GA. The
flow of PCA applied is shown in Fig. 2. The PCA algo-
rithm applied is given below.
Algorithm:
Suppose x1, x1, x1, x1, …, xM are N 9 1 vectors
Step 1: �x ¼ 1M
PMi¼1 xi
Step 2: Subtract the mean: /ðx1� �xÞStep 3: From the matrix A = [/1, /2, /3…/ M]
(N 9 M) Matrix then compute C ¼ 1M
PMN¼1
/n/n ¼ AAT
Step 4: Compute the eigenvalues of C: k1 [ k2 [ … kN
Step 5: Compute the eigenvectors of C = l1, l2 … lN.
Since C is symmetric, C = l1, l2 … lN form a
basis, i.e., any vector x or actually ðx1� �xÞ can
Feature Transformation
Dataset
Optimal Feature Subset Selection
Support Vector Machine (SVM)
Training and Testing
Results
Fig. 1 Intrusion detection proposed model based on SVM
Neural Comput & Applic
123
![Page 4: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/4.jpg)
be written as a linear combination of the eigen-
vectors. ðx1 � �xÞ ¼ b1l1 þ b2l2 þ � � � þ bNlN ¼PNi¼1 bili
Step 6: The dimensionality reduction step (based on
largest eigenvalues) is skipped as we select
principal components using GA.
5 Genetic algorithm
GA is based on the theory of evolution [3–8, 8–10, 10–12]. It
is mostly used to resolve the optimization problems. It ini-
tiates with the initial random population of solutions, where
each solution is comprised by a chromosome. New genera-
tion of solutions is produced from preceding generation
based on the specific criteria. This procedure is repeated
many times until a certain criteria are met. Each solution has
a fixed length chromosome of bits, where every bit corre-
sponds to a feature in a feature vector. In a chromosome, the
presence of the bit 1 means that the corresponding feature
will be selected and bit 0 means that the corresponding
feature will not be selected. GA provides a simple, general,
and powerful framework in feature selection so we used it in
this work [1–3]. Further, in feature reduction process using
PCA, the principal components are selected based on highest
eigenvalues in a traditional way which causes the possibility
of missing many important features. Therefore, in order to
overcome above issues, we applied GA to search the prin-
cipal components space so that an optimal subset of features
is selected. This is our main contribution that positively
impact on the performance of intrusion detection analysis
engine. GA operates iteratively on a population of solutions.
A randomly generated set of strings (1&0) forms the initial
population from which the GA starts its search. GA has three
basic genetic operators those guide this search: selection,
crossover, and mutation [1–3, 10]. The genetic search pro-
cess is iterative: evaluating, selecting, and recombining
strings in the population during each iteration until reaching
some termination condition. The flow of GA applied is
shown in Fig. 3. The basic algorithm, where P(g) is the
population of strings at generation g, is given below.
Algorithm:
g = 0
Initialize P(g)
Evaluate P(g)
while (termination condition is not satisfied) do
Begin
Select P (g ? 1) from P(g)
Recombine P (g ? 1)
Evaluate P (g ? 1)
g = g ? 1
End
Assessment of each string is based on a fitness function
that is problem dependent. It decides which of the candidate
solutions are better. This corresponds to the environmental
determination of survivability in natural selection. Selection
of a string, which represents a point in the search space,
depends on the string’s fitness relative to those of other
strings in the population. It probabilistically removes, from
the population, those points that have relatively low fitness.
Mutation, as in natural systems, is a very low probability
operator and just flips a specific bit. Mutation plays the role
of restoring lost genetic material. Crossover in contrast is
applied with high probability. It is a randomized yet struc-
tured operator that allows information exchange between
points. Its goal is to preserve the fittest individuals without
introducing any new value [1–3, 11]. In brief, selection
probabilistically filters out solutions that perform poorly,
choosing high-performance solutions to concentrate on or
exploit. Crossover and mutation, through string operations,
generate new solutions for exploration. Given an initial
population of elements, GAs use the feedback from the
evaluation process to select fitter solutions, eventually
converging to a population of high-performance solutions.
GAs do not guarantee a global optimum solution. However,
they have the ability to search through very large search
spaces and come to nearly optimal solutions fast [10].
5.1 Feature selection
We used a simple encoding scheme where the chromosome
is a bit string whose length is determined by the number of
Input Data
Find Mean
Calculate Deviation from Mean
Find Covariance Matrix
Calculate Eigenvalues and Eigenvectors
Fig. 2 PCA algorithm flow
Neural Comput & Applic
123
![Page 5: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/5.jpg)
principal components. Each principal component, com-
puted using PCA, is associated with one bit in the string. If
the ith bit is 1, then the ith principal component is selected,
otherwise, that component is ignored. Each chromosome
thus represents a different subset of principal components
[3, 4].
5.2 Feature subset fitness evaluation
The key aim of feature subset selection is to use less fea-
tures to achieve the same or better performance. Therefore,
the fitness evaluation contains two terms: (1) accuracy and
(2) the number of features selected. The performance of
SVM is estimated using a validation dataset which guides
the GA search. Each feature subset contains a certain
number of principal components. If two subsets achieve the
same performance, while containing different number of
principal components, the subset with fewer principal
components is preferred [1–3, 10]. Between accuracy and
feature subset size, accuracy is our major concern. We used
the fitness function shown below to combine the two terms:
fitness ¼ 104 Accuracyþ 0:5 Zeros ð1Þ
where accuracy corresponds to the classification accuracy
on a validation set for a particular subset of principal
components, and Zeros correspond to the number of
principal components not selected (i.e., zeros in the
chromosome).The accuracy term ranges roughly from
0.50 to 0.99; thus, the first term assumes values from
5,000 to 9,900. The zeros term ranges from 0 to L - 1
where L is the length of the chromosome; thus, the second
term assumes values from 0 to 37 (L = 38). Based on the
weights that we have assigned to each term, the accuracy
term dominates the fitness value. This implies that
individuals with higher accuracy will outweigh
individuals with lower accuracy, no matter how many
features they contain. On the whole, the higher the
accuracy is, the higher the fitness is [10]. Also, the fewer
the number of features is, the higher the fitness is. Selecting
the weights for the two terms of the fitness function is more
objective dependent than application dependent. When we
build an intrusion classification system, among many
factors, we need to find the best balance between model
compactness and performance accuracy. Under some
scenarios, we prefer the best performance, no matter
what the cost might be. If this is the case, the weight
associated with the accuracy term should be very high.
Under different situations, we might favor more compact
models over accuracy, as long as the accuracy is within a
satisfactory range. In this case, we should choose a higher
weight for the zeros term. Thus, we performed four
different experiments using GA and the SVM classifier.
The fitness with 10 features subset is calculated as follows:
fitness ¼ 104ð0:99Þ þ 0:5ð28Þ ¼ 9;900þ 14 ¼ 9;914 ð2Þ
5.3 Initial population
The initial population is mostly produced arbitrarily. This
will yield a population where each individual comprises
almost the same number of 1s and 0s on the average. To
search subsets of different numbers of features, the number
of 1s for each individual is generated arbitrarily. Then, the
Create initial population
Evaluate population
Is end of evaluation reached?
Best individuals
Results
Selection Crossover Mutation
Fig. 3 GA algorithm flow
Neural Comput & Applic
123
![Page 6: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/6.jpg)
1s are randomly dispersed in the chromosome. In all
experiments, we used a population size of 5,000 and 100
generations. Generally the GA converged in \100 gener-
ations [1, 3, 4].
5.4 Selection
Selection is a genetic operator that picks chromosomes
from the population of current generation to include them
in the population of next generation. The designated
chromosomes undergo crossover and mutation and then
made the population of subsequent generation. There are
five selection operators: roulette, tournament, top percent,
best, and random [1–3, 10].
5.4.1 Roulette
The selection of a chromosome is proportional to its fitness
or rank [2, 3, 10]. This concept is inspired by the theory of
survival of the fittest. Further, the selection of a chromo-
some can be on the bases of fitness or rank.
5.4.2 Tournament
A subset of chromosomes is produced to use a roulette
selection N times (‘‘the Tournament Size’’). This subset
consists of the best chromosome as selected in this process.
Further, it is an addition that applies particular pressure
over basic roulette selection method. Selection of chro-
mosome can be opted based on fitness or rank [1, 10].
5.4.3 Best
The best chromosome is selected based on the lowest cost
of the training phase. One of them is elected randomly in
case of two or more chromosomes with the same best cost
[2, 10].
5.4.4 Random
A chromosome is selected randomly from the population of
a generation.
5.4.5 Top percent
Randomly selects a chromosome from the top N percent
(‘‘the percentage’’) of the population [1–3]. We used top
percent selection method in our experiments because it
gives better performance when compared to other selection
operators. So, our selection strategy was GA generational.
Assuming a population of size N, the offspring doubles the
size of the population and we select the best top 10 %
individuals from the combined parent-offspring population.
5.5 Crossover
There are three fundamental types of crossovers: one-point
crossover, two-point crossover, and uniform crossover [1,
2]. For one-point crossover, the parent chromosomes are
divided at a common point chosen randomly, and the
resulting sub-chromosomes are swapped. For two-point
crossover, the chromosomes are thought of as rings with the
first and last gene connected (i.e., wrap-around structure). In
this case, the rings are divided at two common points
chosen randomly, and the resulting sub-rings are swapped.
Uniform crossover is different from the above two schemes.
In this case, each gene of the offspring is selected randomly
from the corresponding genes of the parents. For simplicity,
we used one-point crossover here. The crossover probabil-
ity used in all of our experiments was 0.9.
5.6 Mutation
Mutation is a genetic operator that alters one or more gene
values in a chromosome from its initial state [1, 4]. This
can result in entirely new gene values being added to the
gene pool. With these new gene values, the GA may be
able to arrive at a better solution than was previously
possible. Mutation is an important part of the genetic
search as it helps to prevent the population from stagnating
at any local optima. Mutation occurs during evolution
according to the probability defined. This probability
should usually be set fairly low. If it is set too high, the
search will turn into a primitive random search [6]. We use
the traditional mutation operator which just flips a specific
bit with a very low probability. The mutation probability
used in all of our experiments was 0.01.
6 Support vector machines
False-positive reduction and the discrimination between
normal and intrusive connections are both classification
problems. In classification problem, an unknown pattern is
assigned to a predefined class, according to the character-
istics of the pattern, presented in the form of a feature
vector. Numerous classification techniques exist. We used
SVM for the classification of intrusions. In our case, we are
dealing with a binary classification problem, where a
connection is to be classified either normal or intrusive.
SVM classifiers [11, 13] are the most advanced ones,
generally, designed to solve binary classification problems,
thus perfectly suite our requirements.
SVM finds an optimal hyperplane that separates the data
belonging to different classes with large margins in a high-
dimensional space [14]. The margin is defined as the sum
of distances to the decision boundary (hyperplane) from the
Neural Comput & Applic
123
![Page 7: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/7.jpg)
nearest points (support vectors) of the two classes. SVM
formulation is based on statistical learning theory and has
attractive generalization capabilities in linear as well as
nonlinear decision problems [13, 15]. SVM uses structural
risk minimization as opposed to empirical risk minimiza-
tion [11, 13] by reducing the probability of misclassifying
an unknown pattern drawn randomly from a fixed but
unknown distribution. When the data are linearly separa-
ble, SVM computes the hyperplane that maximizes the
margin between the training examples and the class
boundary. When the data are not linearly separable, the
examples are mapped to a high-dimensional space where
such a separating hyperplane can be found. The mechanism
that defines this mapping process is called the kernel
function.
SVMs are powerful classifiers with good performance in
the domain of intrusion detection. They can be applied to
data with a great number of features, but it has been
showed that their performance is increased by reducing the
number of features. The key characteristic of SVM is its
mathematical tractability and geometric interpretation.
This has facilitated a rapid growth of interest in SVMs over
the last few years, demonstrating remarkable success in
several fields [6, 8, 10, 11, 16–18].
Assuming there are l examples from two classes.
ðx1; y1Þðx2; y2Þ. . .ðxi; yiÞ; xi 2 RN ; yi 2 �1;þ1 ð3Þ
Finding the optimal hyperplane implies solving
a constrained optimization problem using quadratic
programming. The optimization criterion is the width of
the margin between the classes. The discriminate hyperplane
is defined as:
f 0ðxÞ ¼Xi
i¼1
yiaikðx; xiÞ þ b ð4Þ
where k(x, xi) is a kernel function and the sign of
f(x) indicates the membership of x. Constructing the
optimal hyperplane is equivalent to find all the nonzero
ai. Any data point xi corresponding to a nonzero ai is a
support vector of the optimal hyperplane. Suitable kernel
functions can be expressed as a dot product in some space
and satisfy the Mercer’s condition. By using different
kernels, SVMs implement a variety of learning machines
(e.g., a sigmoidal kernel corresponding to a two-layer
sigmoidal neural network, while a Gaussian kernel
corresponding to a RBF neural network). The Gaussian
radial basis kernel is given by
kðx; xiÞ ¼ exp � jjx� xijj2
2r2
!
ð5Þ
The Gaussian kernel is used in this work. Our experiments
have shown that the Gaussian kernel outperforms other
kernels in the context of our applications. The SVM is
implemented using the kernel Adatron algorithm. The
kernel Adatron maps inputs to a high-dimensional feature
space and then optimally separates data into their
respective classes by isolating those inputs which fall
1
2
3
4
10
Σ
SVM architecture used for classification
1/-1
g(x)
Center at input 1
Center at input 10
α1
α2
α10
b
Output layer
f(x)
Input layer
Processing layer
Fig. 4 Structure of applied SVM as intrusion analysis engine
Neural Comput & Applic
123
![Page 8: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/8.jpg)
close to the data boundaries. Therefore, the kernel Adatron
is especially effective in separating sets of data which
share complex boundaries. SVMs can only be used for
classification, not for function approximation [6, 8, 16].
The architecture of SVM applied in this work is shown
in Fig. 4.
The kernel Adatron algorithm used in SVM is given
below.
Algorithm:
Step 1: Initialize ai = 1
Step 2: Starting from pattern i = 1, for labeled points
(xi, yi). Calculate zi ¼ yi
Ppj¼1 ajyjkðxi; xjÞ
Step 3: For all patterns i calculate ri = yi zi and
execute steps 4–5 below.
Step 4: Let dai = g(1 - ci) be the proposed change to
the multipliers ai
Step 5.1: If (ai ? dai) B 0 then the proposed change to
the multipliers would result in a negative ai.
Consequently to avoid this problem, we set
ai = 0.
Step 5.2: If (ai ? dai) [ 0 then the multipliers are
updated through the addition of the dai, i.e.,
ai / ai ? dai .
Step 6: Calculate the bias b from b ¼ 12ðminðzþi Þ þ
maxðz�i ÞÞ where zi? are those patterns i with
class label ?1 and zi- are those with class
label -1.
If a maximum number of presentations of the patternset
has been exceeded then stop, otherwise return tostep 2.
7 Methodology
Any unauthorized user that can access computer and net-
work resources and play something havoc is called intruder
[2, 19]. A system that detects such illegal user is called
IDS. Several IDSs are available and they all require the
suitable recognition of the attack [20]. A methodology is
designed to improve the recognition ability of such sys-
tems. It consists of five different phases: selection of
dataset, pre-processing of dataset, classification approach,
training the system, and testing the system. The adopted
methodology is shown in Fig. 5.
7.1 Selection of dataset
This research work used KDD cup 99 dataset for experi-
ments. The selection of this dataset is due to its standard-
ization, content richness and it helps to evaluate results
with existing researches in the area of intrusion detection
[3, 4]. The raw dataset consists of 41 features those are
represented in the below equation.
x1; x2; . . .; xn ð6Þ
where n = 41, represents number of features.
7.2 Pre-processing of dataset
After selection of the dataset, the raw dataset is pre-pro-
cessed so that it can be given to the selected classifiers,
SVM. The raw dataset is pre-processed in three ways: (1)
discarding symbolic values, (2) feature transformation
using PCA, and (3) optimal features subset selection using
GA.
7.2.1 Discarding symbolic values
In first step of pre-processing, three symbolic values (e.g.,
udp, private, and SF) are discarded out of 41 features of the
dataset. The resultant features are shown in the equation as:
x1; x2; . . .; xm ð7Þ
where m = 38, shows feature set of 38 features.
7.2.2 Feature transformation
In second step of pre-processing, PCA has applied on 38
features of the dataset. Mostly, PCA is used for data
reduction, but here, PCA is used for feature transformation
into principal feature space as shown in below equation.
pc1; pc2; pc3; . . .; pcl ð8Þ
where l = 38, shows set of principal components.
Selection of Dataset
Pre-processing of Dataset
Classification Approach
Training the System
Testing the System
Fig. 5 Methodology used for intrusion detection
Neural Comput & Applic
123
![Page 9: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/9.jpg)
7.2.3 Optimal feature subset selection
In third step of pre-processing, GA is applied for optimal
features subset selection from principal components search
space. Four different experiments were performed and
selected a subset of 10 features that indicated better per-
formance when compared to others.
7.3 Classification approach
The architecture used for classification is SVM. It is
implemented using kernel Adatron algorithm. The kernel
Adatron maps inputs to high-dimensional feature space and
then optimally separates data into their respective classes
such as normal and intrusive by isolating those inputs that
fall close to the data boundaries [16]. Therefore, kernel
Adatron is especially effective in separating sets of data
that share complex boundary. The structure of imple-
mented SVM is shown in Fig. 6.
7.4 Training the system
In the training phase, we have both input patterns and
desired outputs related to each input vector. Aim of the
training is minimizing the error output produced by the
SVM and the desired output [2]. In order to achieve this
goal, weights are updated by carrying out certain steps
known as training. The parametric specification used for
SVM architecture during training phase is given in Table 1.
7.5 Testing the system
When the system is trained well, then weights of the sys-
tem are frozen and performance of the system is evaluated.
Testing of trained system involves two steps: (1) verifica-
tion step and (2) generalization step.
In verification step, trained system is tested against the
data which are used in training. The purpose of the veri-
fication step is to investigate how well trained system
learned the training patterns in the training dataset. If a
system was trained successfully then the outputs produced
by the system would be similar to the real outputs. In this
research work, 30 % of the training dataset (5,000) is used
as verification that is 1,500.
In generalization step, testing is conducted with data
which is not used in training. The purpose of the general-
ization step is to measure generalization ability of the
trained network. After training, the system only involves
computation of the feed forward phase. For this purpose, a
production dataset is used that has input data but no desired
data. This work used a dataset of fifteen thousand (15,000)
as a production dataset. Further, the system performance is
also tested on total dataset (20,000) that consists of both
training dataset and production dataset. The parametric
specification used for SVM architecture during testing
phase is given in Table 2.
8 Experimental results
The system is evaluated on different feature subsets those
are obtained from genetic principal components. This
section presents results and their sensitivity analysis in
different scenarios. First of all, the system is tested on
original dataset without using PCA and GA, which con-
sists of 38 features. Five thousand exemplars or input
samples are randomly selected from 20,000 connections as
training dataset. These exemplars contain two types of
connections: normal and intrusive, in which 3,223 are
normal and 1,777 are intrusive. This set is further divided
into three subsets: training set (2,500), cross-validation set
(1,000), and testing set (1,500). The remaining fifteen
thousand exemplars are used to test generalization ability
of the trained system. The sensitivity analysis is presented
in terms of true positive, false positive, false negative, and
true negative.
8.1 Testing phase analysis
The purpose of testing phase is to observe the system how
well the system ‘‘learned’’ the training dataset after the
training process. The sensitivity analysis of confusion
matrix of testing phase is shown in Table 3. The overall
performance of testing phase is presented in Table 4.
Fig. 6 Implemented SVM architecture
with RBF kernel function and large
margin classifier
Neural Comput & Applic
123
![Page 10: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/10.jpg)
8.2 Verification phase analysis
In verification phase, the trained system with different
feature set is tested on production dataset, which is not a
part of the training set, in order to observe generalization
performance of the trained system. The overall perfor-
mance of the system during verification phase with dif-
ferent feature set is shown in Table 5. Table 6 shows
comparative analysis among various feature sets. The
results indicate that the SVM-based system has increased
its performance on feature set based on genetic principal
components when compared to other feature sets.
The results in Table 6 proved that the intrusion detection
mechanism using GA to search the PCA features space
for genetic principal components provides optimal
performance when compared to traditional way of selecting
features from PCA search space. The key focus of the
research was to select sensitive features and minimum
features as well as to increase accuracy of the system.
Consequently, research work achieved this objective by
using GA and PCA that made the SVM classifier simpler as
well as more efficient in performance. Hence, this method
shows that proposed method provides SVM-based intru-
sion detection mechanism that outperforms the existing
approaches.
8.3 Comparison with other approaches
The experimental results are compared with the results
presented in related work. Table 7 shows comparative
Table 1 SVM parameters
during training phaseS. no. Parameter name Value
1 Architecture SVM
2 Layers 03 (input, Gaussian and output)
3 Input samples
features
38 (original), 22 (PCA), and 10 (GA)
4 PEs in input layer It depends on features subset selections. For examples: 38, 22, and 10
5 SVM input synapse If input are 10 then its outputs are 2,500
6 PEs in Gaussian layer If number of features are 10 then PEs are 2,500 in Gaussian layer
7 SVM output synapse Inputs 2,500 and output 1
8 SVM step size 0.01
9 Weight decay 0.01
10 Epochs 1,000
11 PE in output layer One that has value 0 and 1
12 Activation function Gaussian
13 Training algorithm Backpropagation (RBF) and Kernel Adatron (SVM)
14 Training dataset 5,000 connections in which 20 % for cross-validation and 30 % for testing
Table 2 SVM parameters
during testing phaseS. no. Parameter name Value
1 Architecture SVM
2 Layers 03 (input, gaussian and output)
3 Input samples features 38 (original), 22 (PCA), and 10 (GA)
4 PEs in input layer It depends on features subset selections
5 SVM input synapse If input are 10 then its outputs are 2,500
6 PEs in Gaussian layer If number of features are 10 then PEs are 2,500 in Gaussian layer
7 SVM output synapse Inputs 2,500 and output 1
8 SVM step size 0.01
9 Weight decay 0.01
10 Epochs 1
11 PE in output layer One that has value 0 and 1
12 Activation function Gaussian
15 Testing dataset 3,000 connections for testing and 2,000 for cross-validation
16 Production dataset 20,000 connections
Neural Comput & Applic
123
![Page 11: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/11.jpg)
Table 3 Sensitivity analysis of
training, cross-validation, and
testing dataset
Feature
set
Dataset (s) True
positive (%)
False
positive (%)
False
negative (%)
True
negative (%)
Raw-38 Training 100 0.0 0.0 100
Cross-validation 93.65 6.34 2.47 97.52
Testing 93.65 6.34 2.47 97.52
PCA-38 Training 100 0.0 0.0 100
Cross-validation 99.07 0.93 0.58 99.42
Testing 98.66 1.33 0.759 99.24
PCA-22 Training 99.37 0.63 0.56 99.44
Cross-validation 99.50 0.46 0.85 99.14
Testing 99.48 0.51 0.95 99.05
GPC-12 Training 98.30 1.70 0.0 100
Cross-validation 100 0.0 0.0 100
Testing 99.79 0.21 0.76 99.24
GPC-10 Training 99.38 0.61 0.0 100
Cross-validation 99.38 0.61 0.0 100
Testing 99.89 0.10 0.50 99.50
Table 4 Overall performance of testing phase
Feature set Training time (H:M:S) Training epochs (number) Detection rate (%) False alarm (%)
Raw-38 2:21:17 1,000 95.58 4.42
PCA-38 2:39:04 1,000 98.95 1.05
PCA-22 2:08:18 1,000 99.26 0.74
GPC-12 0:53:28 1,000 99.47 0.53
GPC-10 0:16:14 1,000 99.51 0.49
Table 6 SVM performance on different feature set
Feature set GPC10 GPC12 PC22 PC38 Raw-38
False alarm 07 11 24 79 11,455
Epochs 1,000 1,000 1,000 1,000 1,000
Time 01:16:14 01:36:01 02:08:18 02:39:04 02:21:17
Features size 564 KB 2.17 MB 5.15 MB 8.37 MB 8.37 MB
False
positive
0 0 24 79 11,455
False
negative
07 11 0 0 0
True
positive
12,807 12,811 12,776 12,721 1,345
True
negative
7,193 7,189 7,224 7,279 18,655
Table 7 Performance comparison with other approaches
Approach(s) Detection rate (%)
(SVM ? GPC-10) [our approach] 99.96
(SVM ? GPC-12) [our approach] 99.94
(PCA ? GA ? SVM) [3] 99.60
(MLP ? PCA) [1] 98.57
(GA ? SVM) [16] 98
(SVM) [21] 83.2
(MLP) [21] 82.5
(PCA ? NN) [22] 92.2
(RBF/Elman) [7] 93
(ART1, ART2, SOM) [23] 97.42, 97.19, 95.74
Table 5 Overall performance of verification phase
Feature set Features True positive True negative Normal (64 %) Intrusive (36 %) False alarm
Raw-38 760,000 1,345 18,655 6.72 93.27 11,455
PCA-38 760,000 12,721 7,279 63.605 36.395 79
PCA-22 440,000 12,776 7,224 63.88 36.12 24
GPC-12 240,000 12,811 7,189 64.055 35.945 11
GPC-10 200,000 12,807 7,193 64.035 35.965 07
Neural Comput & Applic
123
![Page 12: Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components](https://reader036.vdocument.in/reader036/viewer/2022080922/575094d31a28abbf6bbc7141/html5/thumbnails/12.jpg)
analysis of applied approach with other approaches. Hence,
the results show that our method enhances SVM perfor-
mance in intrusion detection that outperforms the existing
approaches and has the capability to minimize the number
of features (up to 10) and maximize the detection rates (up
to 99.96 %). Therefore, the adopting of SVM based on
genetic principal components is a feasible solution that
satisfies optimal performance.
9 Conclusion
In this article, the performance of intrusion detection is
improved based on optimal feature subset selection which is
obtained from PCA and GA. The selecting of an appropriate
number of principal components is a critical problem in
subset selection. Therefore, GA is applied to search the
genetic principal components that offered a subset of fea-
tures with optimal sensitivity and the highest discriminatory
power. The KDD cup dataset was used that is a benchmark
for evaluating the security detection mechanisms. The SVM
is used for classification purpose. The performance of
applied approach is addressed. Further, a comparative
analysis is made with existing approaches. Consequently,
this method provides optimal performance in intrusion
detection which is capable to minimize amount of features
and maximize the detection rates.
Acknowledgment The authors extend their appreciation to the
College of Computer & Information Sciences Research Center,
Deanship of Scientific Research, King Saud University, Saudi Arabia
for funding this research work. The authors are grateful for this
support.
References
1. Ahmad I (2011) Feature subset selection in intrusion detection
using soft computing techniques. PhD thesis, Universiti Tekno-
logi Petronas (UTP), Perak, Malaysia
2. Ahmad I (2012) Feature subset selection in intrusion detection.
LAP Lambert Academic Publishing AG & Co, Germany
3. Ahmad I, Abdullah A, Alghamdi A, Hussain M (2011) Optimized
intrusion detection mechanism using soft computing techniques.
Telecommun Syst J. doi:10.1007/s11235-011-9541-1
4. Ahmad I, Abdullah A, Alghamdi A, Hussain M, Nafjan K (2011)
Intrusion detection using feature subset selection based on MLP.
Sci Res Essays 6(34):6804–6810
5. Liu G, Yi Z, Yang S (2007) A hierarchical intrusion detection
model based on the PCA neural networks. Neurocomputing
70(7–9):1561–1568
6. Horng S, Ming-Yang S, Yuan-Hsin C, Tzong-Wann K, Rong-Jian
C, Jui-Lin L, Citra Dwi P (2011) A novel intrusion detection
system based on hierarchical clustering and support vector
machines. Expert Syst Appl 38(1):306–313
7. Tong X, Wang Z, Haining Y (2009) A research using hybrid
RBF/Elman neural networks for intrusion detection system secure
model. Comput Phys Commun 180(10):1795–1801
8. Eid HF, Darwish A, Hassanien AE, Abraham A (2010) Principle
components analysis and support vector machine based intrusion
detection system. In: 10th international conference on intelligent
systems design and applications (ISDA), Cairo, Egypt,
pp 363–367
9. Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003) A com-
parison of PCA, KPCA and ICA for dimensionality reduction in
support vector machine. Neurocomputing 55(1–2):321–336
10. Sun Z, Bebis B, Miller R (2004) Object detection using feature
subset selection. Pattern Recognit 37(11):2165–2176
11. Hussain M, Wajid SK, Elzaart A, Berbar M (2011) A comparison
of SVM kernel functions for breast cancer detection. In: 8th IEEE
international conference on computer graphics, imaging and
visualization (CGIV), pp 145–150
12. Yang S, Bebis G, Hussain M, Muhammad G, Mirza A (2013)
Unsupervised discovery of visual face categories. Int J Artif Intell
Tools 22(01):1250029-1–1250029-30. doi:10.1142/S021821301
2500297
13. Vapnik V (1995) Statistical learning theory. Springer, New York
14. Boser BE,Guyon IM, Vapnik V (1992) A training algorithm for
optimal margin classifiers. In: Proceedings of the 5th annual
workshop on computational learning theory, pp 144–152
15. Burges C (1998) Tutorial on support vector machines for pattern
recognition. Data Min Knowl Discov 2(2):955–974
16. Kim D, Nguyen H, Syng-Yup O, Jong SP (2005) Fusions of GA
and SVM for anomaly detection in intrusion detection system,
advances in neural networks, vol 3498. Lecture Notes in Com-
puter Science, pp 415–420
17. Gao M, Tian J, Xia M (2009) Intrusion detection method based
on classify support vector machine. In: Presented in the pro-
ceedings of the second international conference on intelligent
computation technology and automation. IEEE Computer Soci-
ety, Washington, DC, pp 391–394
18. Ahmad I, Abdullah A, Alghamdi A, Hussain M (2011) Denial of
service attack detection using support vector machine. J Inf
Tokyo 14(1):127–134
19. Ahmad I, Abdullah A, Alghamdi A (2009) Application of arti-
ficial neural network in detection of DOS attacks. In: Proceedings
of the 2nd international conference on security of information and
networks (SIN ’09), Famagusta, North Cyprus. ACM, New York,
pp 229–234
20. Zargar G, Kabiri P(2010) Selection of effective network param-
eters in attacks for intrusion detection, advances in data mining.
Applications and theoretical aspects, vol 6171. Lecture Notes in
Computer Science, pp 643–652
21. Osareh A, Shadgar B (2008) Intrusion detection in computer
networks based on machine learning algorithms. Int J Comput Sci
Netw Secur (IJCSNS) 8(11):15–23
22. Lakhina S, Joseph S, Verma B (2010) Feature reduction using
principal component analysis for effective anomaly-based intru-
sion detection on NSL–KDD. Int J Eng Sci Technol 2(6):
1790–1799
23. Amini M, Jalili R, Shahriari H (2006) RT–UNNID: a practical
solution to real-time network-based intrusion detection using
unsupervised neural networks. Comput Appl Secur 25(6):459–468
Neural Comput & Applic
123