Download - Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components

ORIGINAL ARTICLE

Enhancing SVM performance in intrusion detection using optimalfeature subset selection based on genetic principal components

Iftikhar Ahmad • Muhammad Hussain •

Abdullah Alghamdi • Abdulhameed Alelaiwi

Received: 18 September 2012 / Accepted: 25 February 2013

� Springer-Verlag London 2013

Abstract Intrusion detection is very serious issue in these

days because the prevention of intrusions depends on

detection. Therefore, accurate detection of intrusion is very

essential to secure information in computer and network

systems of any organization such as private, public, and

government. Several intrusion detection approaches are

available but the main problem is their performance, which

can be enhanced by increasing the detection rates and

reducing false positives. This issue of the existing tech-

niques is the focus of research in this paper. The poor

performance of such techniques is due to raw dataset which

confuse the classifier and results inaccurate detection due

to redundant features. The recent approaches used principal

component analysis (PCA) for feature subset selection

which is based on highest eigenvalues, but the features

corresponding to the highest eigenvalues may not have the

optimal sensitivity for the classifier due to ignoring many

sensitive features. Instead of using traditional approach of

selecting features with the highest eigenvalues such as

PCA, this research applied a genetic algorithm to search

the genetic principal components that offers a subset of

features with optimal sensitivity and the highest discrimi-

natory power. The support vector machine (SVM) is used

for classification purpose. This research work used the

knowledge discovery and data mining cup dataset for

experimentation. The performance of this approach was

analyzed and compared with existing approaches. The

results show that proposed method enhances SVM perfor-

mance in intrusion detection that outperforms the existing

approaches and has the capability to minimize the number

of features and maximize the detection rates.

Keywords Intrusion detection system (IDS) � Support

vector machines (SVMs) � Principal component analysis

(PCA) � Genetic algorithm (GA) � Genetic principal

component (GPC) � Detection rate (DR) and dataset

1 Introduction

Currently, intrusions on network systems are key security

threats. Therefore, it is significant to stop such intrusions. In

order to prevent such intrusions, their detection is very

necessary. Further, detection is a key part of any security

tool such as intrusion detection system (IDS), intrusion

prevention system (IPS), adaptive security alliance (ASA),

checkpoints, and firewalls [1]. Therefore, accurate detection

of network attack is necessary. Several intrusion detection

techniques are offered but the leading problem is their

performance, which can be improved by increasing the

detection rates and reducing false positives. Such draw-

backs of the past techniques have motivated this research.

One of the drawbacks of the past intrusion detection

approaches is the usage of a raw dataset for classification

but the classifier may get confused due to redundancy and

hence may not classify correctly. To overcome this issue,

principal component analysis (PCA) has been applied to

transform raw features into principal features space and

select the features based on their sensitivity. The sensitivity

is determined by the values of eigenvalues [2].

The modern methods use PCA to project features space

to principal feature space and select features corresponding

to the highest eigenvalues, but the features corresponding

to the highest eigenvalues may not have the optimal

I. Ahmad (&) � M. Hussain � A. Alghamdi � A. Alelaiwi

Department of Software Engineering, College of Computer

and Information Sciences, King Saud University,

Riyadh, Saudi Arabia

e-mail: [email protected]

123

Neural Comput & Applic

DOI 10.1007/s00521-013-1370-6

sensitivity for the classifier due to ignoring many sensitive

features [3]. Instead of using traditional approach of

selecting features with the highest eigenvalues such as

PCA, this research applied a genetic algorithm (GA) to

search the principal feature space that offers a subset of

features with optimal sensitivity and the highest discrimi-

natory power. Based on the selected features, the classifi-

cation is performed. The support vector machine (SVM) is

used for classification purpose due to their proven ability in

classification. This research work uses the knowledge dis-

covery and data mining (KDD) cup dataset, which is

considered benchmark for evaluating security detection

mechanisms.

The focus of this work is searching the PCA space using

GA to select a subset of principal components called

genetic principal components. This is a novel method in

intrusion detection when compared to traditional methods

in which some percentage of the top principal components

is selected. This method is applied and tested on intrusion

detection which demonstrated the performance enhance-

ment in the SVM classifier or intrusive analysis engine.

The rest of the paper is structured as follows. The related

work is described in Sect. 2. The proposed model is pre-

sented in Sect. 3. Some basic knowledge of PCA is dis-

cussed in Sect. 4, which is applied on the proposed model.

In Sect. 5, details are given for GA for selecting genetic

principal components from the principal space. The clas-

sification process using SVM is explained in Sect. 6. The

applied methodology is described in Sect. 7. Experimental

results are discussed in Sect. 8. Finally, conclusions are

drawn in Sect. 9.

2 Related work

The focus of past work was on of feature extraction and

classification in intrusion detection, and less importance is

given to the critical issue of feature selection. Feature

selection is an important phase before the classification

process. Because the performance of the classifier depends

on optimal subset of features, this issue is ignored in the

area of intrusion detection in existing approaches. Such

approaches mostly rely on powerful classification algo-

rithms to deal with redundant and irrelevant features [3, 4].

In [5], PCA is used for feature selection and neural

networks are used for classification. This work used first 22

feature set from 38 feature set. The principal components

are selected based on highest eigenvalues in a traditional

way which causes the possibility of missing many impor-

tant features. Such features may have the more discrimi-

natory power when compared to selected features. So there

must be a mechanism to the selection of optimal subset of

features in the principal space.

In [6], the significant feature is decided on the basis of

the accuracy and the number of false alarms of the clas-

sifier, with and without the feature. The feature selection is

based on ‘‘leave-one-out’’; eliminate one feature from the

original dataset, repeat the experiment, then match the new

results with the original result, if any case of the described

cases occurs. The feature is considered as important;

otherwise, it is observed as insignificant. Since there are 41

features in the KDD cup, so, the experiment is repeated 41

times to confirm that each feature is either significant or

insignificant. This technique is complicated as well as had

overheads on massive dataset.

In [7], the radial basis function (RBF) network is used as

a classifier, and the Elman network is applied to reinstate

the memory of older actions. This work used full featured

KDD cup dataset which consists of 41 features. This

approach is not good in classification and introduced

overheads. The raw feature set confused the classifier due

to redundancy and results false alarms. Further, it increases

training and testing overheads, reduces accurate detection

rate, consumes more memory and computational resources,

increases architectural complexity and malfunction of the

system.

In [8], PCA is used to decide an optimal feature set.

Such feature set improved the performance of the classifier

and reduces training as well as testing overheads in the

IDSs. But here is an important issue of performance of

intrusive analysis engine based on feature subset selection

method. This method will be a compromise between

training efficiency and the accurate results. Because few

PCA components increase training efficiency but may

cause false alarms, whereas a large number of PCA com-

ponents increase training overheads and complexity.

In [9], the fusions of GA and SVM are described for

features selection and parameters tuning of the system.

This method was capable to minimize the amounts of

features and maximize the detection rates but the problem

is features uniformity. The features in original forms are

not consistent so such features must be transformed in new

feature space in order to well make them more visible and

well organized.

In [3], intrusion detection mechanism was proposed

using soft computing techniques: SVM, GA, and PCA. The

system is implemented and tested on two cases: (1) PCA,

GA, and SVM, (2) PCA and SVM. The focus was com-

paring SVM performance on feature sets: (1) 12 features

obtained by PCA and GA, (2) 22 features directly taken

from the PCA output using traditional method. This

approach is required further work to testify and validate it

with detailed experimentation.

In [4], an initial effort is made on feature subset selec-

tion in intrusion detection. The presented technique was not

properly explained and demonstrated to verify it. This


123

method conducted three experiments, MLP as classifier

with 12, 20, and 27 feature set. There are a number of

issues in this work such as there is possibility that classifier

can perform well on original dataset (raw dataset), trans-

formed dataset (PCA), and PCA dataset obtained conven-

tionally. This work was not enough to testify the proposed

approach. However, this approach is required experimen-

tation to verify it.

The feature selection is an important problem in intru-

sion detection because performance of the classifier or

intrusive analysis engine depends on it. The more accurate

dataset results the more accuracy and performance in

intrusion detection. The GAs provide a simple, general, and

powerful framework for selecting good subsets of features,

leading to improved detection rates [3, 4, 10].

3 Proposed model

The proposed model is shown in Fig. 1. This model has six

parts. First part is the selection of dataset for experimen-

tation. Dataset can be formed by different ways but this

work used KDD cup which is standard and benchmark in

the area of intrusion detection and security evaluation

frameworks. Second part is feature transformation which is

very important to make features more visible, organize, and

discriminant in the principal space. The PCA is used for

transformation and to overcome the issue of redundancy.

Third part is the feature subset selection which is different

from previous methods. For this purpose, GA is applied to

search the PCA space to select a subset of principal com-

ponents. These principal components are known as genetic

principal components in this work. Fourth part is the

classification which is core of the intrusion detection

mechanism. The SVM is used as classifier due to its proven

ability in classification problems. Further, it is a good

solution in two class problem, and it is more suitable in this

work of two classes: normal and intrusive. Fifth part is the

training and testing of the system. Training is tuning the

system to find optimal parameters, and testing is the

evaluation of the trained system. This work is extension of

previous work [3]. This model is further explained in next

sections of the paper as well as in methodology section.

4 Principal component analysis

PCA is one of the common techniques used for dimen-

sionality reduction. We use PCA to remove the redundancy

present in the features and compute a compact represen-

tation which makes the features more visible and organize

in the new space called the principal space. It is a valuable

statistical method that several application in fields such as

face recognition and image compression and is a common

technique for finding patterns in data of high dimension.

The goal of PCA is to reduce the dimensionality of the data

while retaining as much as possible of the variation present

in the original dataset. It is a way of identifying patterns in

data and expressing the data in such a way as to highlight

their similarities and differences [1–3, 9]. The whole sub-

ject of statistics is based on the idea that you have a big set

of data, and you want to analyze it in terms of the rela-

tionships between the individual points in that set [3, 7].

So, we have a feature set and the goal is to visualize it in

order to determine the principal components. The electing

of principal components is carried out through GA. The

flow of PCA applied is shown in Fig. 2. The PCA algo-

rithm applied is given below.

Algorithm:

Suppose x1, x1, x1, x1, …, xM are N 9 1 vectors

Step 1: �x ¼ 1M

PMi¼1 xi

Step 2: Subtract the mean: /ðx1� �xÞStep 3: From the matrix A = [/1, /2, /3…/ M]

(N 9 M) Matrix then compute C ¼ 1M

PMN¼1

/n/n ¼ AAT

Step 4: Compute the eigenvalues of C: k1 [ k2 [ … kN

Step 5: Compute the eigenvectors of C = l1, l2 … lN.

Since C is symmetric, C = l1, l2 … lN form a

basis, i.e., any vector x or actually ðx1� �xÞ can

Feature Transformation

Dataset

Optimal Feature Subset Selection

Support Vector Machine (SVM)

Training and Testing

Results

Fig. 1 Intrusion detection proposed model based on SVM


123

be written as a linear combination of the eigen-

vectors. ðx1 � �xÞ ¼ b1l1 þ b2l2 þ � � � þ bNlN ¼PNi¼1 bili

Step 6: The dimensionality reduction step (based on

largest eigenvalues) is skipped as we select

principal components using GA.

5 Genetic algorithm

GA is based on the theory of evolution [3–8, 8–10, 10–12]. It

is mostly used to resolve the optimization problems. It ini-

tiates with the initial random population of solutions, where

each solution is comprised by a chromosome. New genera-

tion of solutions is produced from preceding generation

based on the specific criteria. This procedure is repeated

many times until a certain criteria are met. Each solution has

a fixed length chromosome of bits, where every bit corre-

sponds to a feature in a feature vector. In a chromosome, the

presence of the bit 1 means that the corresponding feature

will be selected and bit 0 means that the corresponding

feature will not be selected. GA provides a simple, general,

and powerful framework in feature selection so we used it in

this work [1–3]. Further, in feature reduction process using

PCA, the principal components are selected based on highest

eigenvalues in a traditional way which causes the possibility

of missing many important features. Therefore, in order to

overcome above issues, we applied GA to search the prin-

cipal components space so that an optimal subset of features

is selected. This is our main contribution that positively

impact on the performance of intrusion detection analysis

engine. GA operates iteratively on a population of solutions.

A randomly generated set of strings (1&0) forms the initial

population from which the GA starts its search. GA has three

basic genetic operators those guide this search: selection,

crossover, and mutation [1–3, 10]. The genetic search pro-

cess is iterative: evaluating, selecting, and recombining

strings in the population during each iteration until reaching

some termination condition. The flow of GA applied is

shown in Fig. 3. The basic algorithm, where P(g) is the

population of strings at generation g, is given below.

Algorithm:

g = 0

Initialize P(g)

Evaluate P(g)

while (termination condition is not satisfied) do

Begin

Select P (g ? 1) from P(g)

Recombine P (g ? 1)

Evaluate P (g ? 1)

g = g ? 1

End

Assessment of each string is based on a fitness function

that is problem dependent. It decides which of the candidate

solutions are better. This corresponds to the environmental

determination of survivability in natural selection. Selection

of a string, which represents a point in the search space,

depends on the string’s fitness relative to those of other

strings in the population. It probabilistically removes, from

the population, those points that have relatively low fitness.

Mutation, as in natural systems, is a very low probability

operator and just flips a specific bit. Mutation plays the role

of restoring lost genetic material. Crossover in contrast is

applied with high probability. It is a randomized yet struc-

tured operator that allows information exchange between

points. Its goal is to preserve the fittest individuals without

introducing any new value [1–3, 11]. In brief, selection

probabilistically filters out solutions that perform poorly,

choosing high-performance solutions to concentrate on or

exploit. Crossover and mutation, through string operations,

generate new solutions for exploration. Given an initial

population of elements, GAs use the feedback from the

evaluation process to select fitter solutions, eventually

converging to a population of high-performance solutions.

GAs do not guarantee a global optimum solution. However,

they have the ability to search through very large search

spaces and come to nearly optimal solutions fast [10].

5.1 Feature selection

We used a simple encoding scheme where the chromosome

is a bit string whose length is determined by the number of

Input Data

Find Mean

Calculate Deviation from Mean

Find Covariance Matrix

Calculate Eigenvalues and Eigenvectors

Fig. 2 PCA algorithm flow


123

principal components. Each principal component, com-

puted using PCA, is associated with one bit in the string. If

the ith bit is 1, then the ith principal component is selected,

otherwise, that component is ignored. Each chromosome

thus represents a different subset of principal components

[3, 4].

5.2 Feature subset fitness evaluation

The key aim of feature subset selection is to use less fea-

tures to achieve the same or better performance. Therefore,

the fitness evaluation contains two terms: (1) accuracy and

(2) the number of features selected. The performance of

SVM is estimated using a validation dataset which guides

the GA search. Each feature subset contains a certain

number of principal components. If two subsets achieve the

same performance, while containing different number of

principal components, the subset with fewer principal

components is preferred [1–3, 10]. Between accuracy and

feature subset size, accuracy is our major concern. We used

the fitness function shown below to combine the two terms:

fitness ¼ 104 Accuracyþ 0:5 Zeros ð1Þ

where accuracy corresponds to the classification accuracy

on a validation set for a particular subset of principal

components, and Zeros correspond to the number of

principal components not selected (i.e., zeros in the

chromosome).The accuracy term ranges roughly from

0.50 to 0.99; thus, the first term assumes values from

5,000 to 9,900. The zeros term ranges from 0 to L - 1

where L is the length of the chromosome; thus, the second

term assumes values from 0 to 37 (L = 38). Based on the

weights that we have assigned to each term, the accuracy

term dominates the fitness value. This implies that

individuals with higher accuracy will outweigh

individuals with lower accuracy, no matter how many

features they contain. On the whole, the higher the

accuracy is, the higher the fitness is [10]. Also, the fewer

the number of features is, the higher the fitness is. Selecting

the weights for the two terms of the fitness function is more

objective dependent than application dependent. When we

build an intrusion classification system, among many

factors, we need to find the best balance between model

compactness and performance accuracy. Under some

scenarios, we prefer the best performance, no matter

what the cost might be. If this is the case, the weight

associated with the accuracy term should be very high.

Under different situations, we might favor more compact

models over accuracy, as long as the accuracy is within a

satisfactory range. In this case, we should choose a higher

weight for the zeros term. Thus, we performed four

different experiments using GA and the SVM classifier.

The fitness with 10 features subset is calculated as follows:

fitness ¼ 104ð0:99Þ þ 0:5ð28Þ ¼ 9;900þ 14 ¼ 9;914 ð2Þ

5.3 Initial population

The initial population is mostly produced arbitrarily. This

will yield a population where each individual comprises

almost the same number of 1s and 0s on the average. To

search subsets of different numbers of features, the number

of 1s for each individual is generated arbitrarily. Then, the

Create initial population

Evaluate population

Is end of evaluation reached?

Best individuals

Results

Selection Crossover Mutation

Fig. 3 GA algorithm flow


123

1s are randomly dispersed in the chromosome. In all

experiments, we used a population size of 5,000 and 100

generations. Generally the GA converged in \100 gener-

ations [1, 3, 4].

5.4 Selection

Selection is a genetic operator that picks chromosomes

from the population of current generation to include them

in the population of next generation. The designated

chromosomes undergo crossover and mutation and then

made the population of subsequent generation. There are

five selection operators: roulette, tournament, top percent,

best, and random [1–3, 10].

5.4.1 Roulette

The selection of a chromosome is proportional to its fitness

or rank [2, 3, 10]. This concept is inspired by the theory of

survival of the fittest. Further, the selection of a chromo-

some can be on the bases of fitness or rank.

5.4.2 Tournament

A subset of chromosomes is produced to use a roulette

selection N times (‘‘the Tournament Size’’). This subset

consists of the best chromosome as selected in this process.

Further, it is an addition that applies particular pressure

over basic roulette selection method. Selection of chro-

mosome can be opted based on fitness or rank [1, 10].

5.4.3 Best

The best chromosome is selected based on the lowest cost

of the training phase. One of them is elected randomly in

case of two or more chromosomes with the same best cost

[2, 10].

5.4.4 Random

A chromosome is selected randomly from the population of

a generation.

5.4.5 Top percent

Randomly selects a chromosome from the top N percent

(‘‘the percentage’’) of the population [1–3]. We used top

percent selection method in our experiments because it

gives better performance when compared to other selection

operators. So, our selection strategy was GA generational.

Assuming a population of size N, the offspring doubles the

size of the population and we select the best top 10 %

individuals from the combined parent-offspring population.

5.5 Crossover

There are three fundamental types of crossovers: one-point

crossover, two-point crossover, and uniform crossover [1,

2]. For one-point crossover, the parent chromosomes are

divided at a common point chosen randomly, and the

resulting sub-chromosomes are swapped. For two-point

crossover, the chromosomes are thought of as rings with the

first and last gene connected (i.e., wrap-around structure). In

this case, the rings are divided at two common points

chosen randomly, and the resulting sub-rings are swapped.

Uniform crossover is different from the above two schemes.

In this case, each gene of the offspring is selected randomly

from the corresponding genes of the parents. For simplicity,

we used one-point crossover here. The crossover probabil-

ity used in all of our experiments was 0.9.

5.6 Mutation

Mutation is a genetic operator that alters one or more gene

values in a chromosome from its initial state [1, 4]. This

can result in entirely new gene values being added to the

gene pool. With these new gene values, the GA may be

able to arrive at a better solution than was previously

possible. Mutation is an important part of the genetic

search as it helps to prevent the population from stagnating

at any local optima. Mutation occurs during evolution

according to the probability defined. This probability

should usually be set fairly low. If it is set too high, the

search will turn into a primitive random search [6]. We use

the traditional mutation operator which just flips a specific

bit with a very low probability. The mutation probability

used in all of our experiments was 0.01.

6 Support vector machines

False-positive reduction and the discrimination between

normal and intrusive connections are both classification

problems. In classification problem, an unknown pattern is

assigned to a predefined class, according to the character-

istics of the pattern, presented in the form of a feature

vector. Numerous classification techniques exist. We used

SVM for the classification of intrusions. In our case, we are

dealing with a binary classification problem, where a

connection is to be classified either normal or intrusive.

SVM classifiers [11, 13] are the most advanced ones,

generally, designed to solve binary classification problems,

thus perfectly suite our requirements.

SVM finds an optimal hyperplane that separates the data

belonging to different classes with large margins in a high-

dimensional space [14]. The margin is defined as the sum

of distances to the decision boundary (hyperplane) from the


123

nearest points (support vectors) of the two classes. SVM

formulation is based on statistical learning theory and has

attractive generalization capabilities in linear as well as

nonlinear decision problems [13, 15]. SVM uses structural

risk minimization as opposed to empirical risk minimiza-

tion [11, 13] by reducing the probability of misclassifying

an unknown pattern drawn randomly from a fixed but

unknown distribution. When the data are linearly separa-

ble, SVM computes the hyperplane that maximizes the

margin between the training examples and the class

boundary. When the data are not linearly separable, the

examples are mapped to a high-dimensional space where

such a separating hyperplane can be found. The mechanism

that defines this mapping process is called the kernel

function.

SVMs are powerful classifiers with good performance in

the domain of intrusion detection. They can be applied to

data with a great number of features, but it has been

showed that their performance is increased by reducing the

number of features. The key characteristic of SVM is its

mathematical tractability and geometric interpretation.

This has facilitated a rapid growth of interest in SVMs over

the last few years, demonstrating remarkable success in

several fields [6, 8, 10, 11, 16–18].

Assuming there are l examples from two classes.

ðx1; y1Þðx2; y2Þ. . .ðxi; yiÞ; xi 2 RN ; yi 2 �1;þ1 ð3Þ

Finding the optimal hyperplane implies solving

a constrained optimization problem using quadratic

programming. The optimization criterion is the width of

the margin between the classes. The discriminate hyperplane

is defined as:

f 0ðxÞ ¼Xi

i¼1

yiaikðx; xiÞ þ b ð4Þ

where k(x, xi) is a kernel function and the sign of

f(x) indicates the membership of x. Constructing the

optimal hyperplane is equivalent to find all the nonzero

ai. Any data point xi corresponding to a nonzero ai is a

support vector of the optimal hyperplane. Suitable kernel

functions can be expressed as a dot product in some space

and satisfy the Mercer’s condition. By using different

kernels, SVMs implement a variety of learning machines

(e.g., a sigmoidal kernel corresponding to a two-layer

sigmoidal neural network, while a Gaussian kernel

corresponding to a RBF neural network). The Gaussian

radial basis kernel is given by

kðx; xiÞ ¼ exp � jjx� xijj2

2r2

!

ð5Þ

The Gaussian kernel is used in this work. Our experiments

have shown that the Gaussian kernel outperforms other

kernels in the context of our applications. The SVM is

implemented using the kernel Adatron algorithm. The

kernel Adatron maps inputs to a high-dimensional feature

space and then optimally separates data into their

respective classes by isolating those inputs which fall

1

2

3

4

10

Σ

SVM architecture used for classification

1/-1

g(x)

Center at input 1

Center at input 10

α1

α2

α10

b

Output layer

f(x)

Input layer

Processing layer

Fig. 4 Structure of applied SVM as intrusion analysis engine


123

close to the data boundaries. Therefore, the kernel Adatron

is especially effective in separating sets of data which

share complex boundaries. SVMs can only be used for

classification, not for function approximation [6, 8, 16].

The architecture of SVM applied in this work is shown

in Fig. 4.

The kernel Adatron algorithm used in SVM is given

below.

Algorithm:

Step 1: Initialize ai = 1

Step 2: Starting from pattern i = 1, for labeled points

(xi, yi). Calculate zi ¼ yi

Ppj¼1 ajyjkðxi; xjÞ

Step 3: For all patterns i calculate ri = yi zi and

execute steps 4–5 below.

Step 4: Let dai = g(1 - ci) be the proposed change to

the multipliers ai

Step 5.1: If (ai ? dai) B 0 then the proposed change to

the multipliers would result in a negative ai.

Consequently to avoid this problem, we set

ai = 0.

Step 5.2: If (ai ? dai) [ 0 then the multipliers are

updated through the addition of the dai, i.e.,

ai / ai ? dai .

Step 6: Calculate the bias b from b ¼ 12ðminðzþi Þ þ

maxðz�i ÞÞ where zi? are those patterns i with

class label ?1 and zi- are those with class

label -1.

If a maximum number of presentations of the patternset

has been exceeded then stop, otherwise return tostep 2.

7 Methodology

Any unauthorized user that can access computer and net-

work resources and play something havoc is called intruder

[2, 19]. A system that detects such illegal user is called

IDS. Several IDSs are available and they all require the

suitable recognition of the attack [20]. A methodology is

designed to improve the recognition ability of such sys-

tems. It consists of five different phases: selection of

dataset, pre-processing of dataset, classification approach,

training the system, and testing the system. The adopted

methodology is shown in Fig. 5.

7.1 Selection of dataset

This research work used KDD cup 99 dataset for experi-

ments. The selection of this dataset is due to its standard-

ization, content richness and it helps to evaluate results

with existing researches in the area of intrusion detection

[3, 4]. The raw dataset consists of 41 features those are

represented in the below equation.

x1; x2; . . .; xn ð6Þ

where n = 41, represents number of features.

7.2 Pre-processing of dataset

After selection of the dataset, the raw dataset is pre-pro-

cessed so that it can be given to the selected classifiers,

SVM. The raw dataset is pre-processed in three ways: (1)

discarding symbolic values, (2) feature transformation

using PCA, and (3) optimal features subset selection using

GA.

7.2.1 Discarding symbolic values

In first step of pre-processing, three symbolic values (e.g.,

udp, private, and SF) are discarded out of 41 features of the

dataset. The resultant features are shown in the equation as:

x1; x2; . . .; xm ð7Þ

where m = 38, shows feature set of 38 features.

7.2.2 Feature transformation

In second step of pre-processing, PCA has applied on 38

features of the dataset. Mostly, PCA is used for data

reduction, but here, PCA is used for feature transformation

into principal feature space as shown in below equation.

pc1; pc2; pc3; . . .; pcl ð8Þ

where l = 38, shows set of principal components.

Selection of Dataset

Pre-processing of Dataset

Classification Approach

Training the System

Testing the System

Fig. 5 Methodology used for intrusion detection


123

7.2.3 Optimal feature subset selection

In third step of pre-processing, GA is applied for optimal

features subset selection from principal components search

space. Four different experiments were performed and

selected a subset of 10 features that indicated better per-

formance when compared to others.

7.3 Classification approach

The architecture used for classification is SVM. It is

implemented using kernel Adatron algorithm. The kernel

Adatron maps inputs to high-dimensional feature space and

then optimally separates data into their respective classes

such as normal and intrusive by isolating those inputs that

fall close to the data boundaries [16]. Therefore, kernel

Adatron is especially effective in separating sets of data

that share complex boundary. The structure of imple-

mented SVM is shown in Fig. 6.

7.4 Training the system

In the training phase, we have both input patterns and

desired outputs related to each input vector. Aim of the

training is minimizing the error output produced by the

SVM and the desired output [2]. In order to achieve this

goal, weights are updated by carrying out certain steps

known as training. The parametric specification used for

SVM architecture during training phase is given in Table 1.

7.5 Testing the system

When the system is trained well, then weights of the sys-

tem are frozen and performance of the system is evaluated.

Testing of trained system involves two steps: (1) verifica-

tion step and (2) generalization step.

In verification step, trained system is tested against the

data which are used in training. The purpose of the veri-

fication step is to investigate how well trained system

learned the training patterns in the training dataset. If a

system was trained successfully then the outputs produced

by the system would be similar to the real outputs. In this

research work, 30 % of the training dataset (5,000) is used

as verification that is 1,500.

In generalization step, testing is conducted with data

which is not used in training. The purpose of the general-

ization step is to measure generalization ability of the

trained network. After training, the system only involves

computation of the feed forward phase. For this purpose, a

production dataset is used that has input data but no desired

data. This work used a dataset of fifteen thousand (15,000)

as a production dataset. Further, the system performance is

also tested on total dataset (20,000) that consists of both

training dataset and production dataset. The parametric

specification used for SVM architecture during testing

phase is given in Table 2.

8 Experimental results

The system is evaluated on different feature subsets those

are obtained from genetic principal components. This

section presents results and their sensitivity analysis in

different scenarios. First of all, the system is tested on

original dataset without using PCA and GA, which con-

sists of 38 features. Five thousand exemplars or input

samples are randomly selected from 20,000 connections as

training dataset. These exemplars contain two types of

connections: normal and intrusive, in which 3,223 are

normal and 1,777 are intrusive. This set is further divided

into three subsets: training set (2,500), cross-validation set

(1,000), and testing set (1,500). The remaining fifteen

thousand exemplars are used to test generalization ability

of the trained system. The sensitivity analysis is presented

in terms of true positive, false positive, false negative, and

true negative.

8.1 Testing phase analysis

The purpose of testing phase is to observe the system how

well the system ‘‘learned’’ the training dataset after the

training process. The sensitivity analysis of confusion

matrix of testing phase is shown in Table 3. The overall

performance of testing phase is presented in Table 4.

Fig. 6 Implemented SVM architecture

with RBF kernel function and large

margin classifier


123

8.2 Verification phase analysis

In verification phase, the trained system with different

feature set is tested on production dataset, which is not a

part of the training set, in order to observe generalization

performance of the trained system. The overall perfor-

mance of the system during verification phase with dif-

ferent feature set is shown in Table 5. Table 6 shows

comparative analysis among various feature sets. The

results indicate that the SVM-based system has increased

its performance on feature set based on genetic principal

components when compared to other feature sets.

The results in Table 6 proved that the intrusion detection

mechanism using GA to search the PCA features space

for genetic principal components provides optimal

performance when compared to traditional way of selecting

features from PCA search space. The key focus of the

research was to select sensitive features and minimum

features as well as to increase accuracy of the system.

Consequently, research work achieved this objective by

using GA and PCA that made the SVM classifier simpler as

well as more efficient in performance. Hence, this method

shows that proposed method provides SVM-based intru-

sion detection mechanism that outperforms the existing

approaches.

8.3 Comparison with other approaches

The experimental results are compared with the results

presented in related work. Table 7 shows comparative

Table 1 SVM parameters

during training phaseS. no. Parameter name Value

1 Architecture SVM

2 Layers 03 (input, Gaussian and output)

3 Input samples

features

38 (original), 22 (PCA), and 10 (GA)

4 PEs in input layer It depends on features subset selections. For examples: 38, 22, and 10

5 SVM input synapse If input are 10 then its outputs are 2,500

6 PEs in Gaussian layer If number of features are 10 then PEs are 2,500 in Gaussian layer

7 SVM output synapse Inputs 2,500 and output 1

8 SVM step size 0.01

9 Weight decay 0.01

10 Epochs 1,000

11 PE in output layer One that has value 0 and 1

12 Activation function Gaussian

13 Training algorithm Backpropagation (RBF) and Kernel Adatron (SVM)

14 Training dataset 5,000 connections in which 20 % for cross-validation and 30 % for testing

Table 2 SVM parameters

during testing phaseS. no. Parameter name Value

1 Architecture SVM

2 Layers 03 (input, gaussian and output)

3 Input samples features 38 (original), 22 (PCA), and 10 (GA)

4 PEs in input layer It depends on features subset selections

5 SVM input synapse If input are 10 then its outputs are 2,500

6 PEs in Gaussian layer If number of features are 10 then PEs are 2,500 in Gaussian layer

7 SVM output synapse Inputs 2,500 and output 1

8 SVM step size 0.01

9 Weight decay 0.01

10 Epochs 1

11 PE in output layer One that has value 0 and 1

12 Activation function Gaussian

15 Testing dataset 3,000 connections for testing and 2,000 for cross-validation

16 Production dataset 20,000 connections


123

Table 3 Sensitivity analysis of

training, cross-validation, and

testing dataset

Feature

set

Dataset (s) True

positive (%)

False

positive (%)

False

negative (%)

True

negative (%)

Raw-38 Training 100 0.0 0.0 100

Cross-validation 93.65 6.34 2.47 97.52

Testing 93.65 6.34 2.47 97.52

PCA-38 Training 100 0.0 0.0 100


Testing 98.66 1.33 0.759 99.24

PCA-22 Training 99.37 0.63 0.56 99.44


Testing 99.48 0.51 0.95 99.05

GPC-12 Training 98.30 1.70 0.0 100

Cross-validation 100 0.0 0.0 100

Testing 99.79 0.21 0.76 99.24

GPC-10 Training 99.38 0.61 0.0 100

Cross-validation 99.38 0.61 0.0 100

Testing 99.89 0.10 0.50 99.50

Table 4 Overall performance of testing phase

Feature set Training time (H:M:S) Training epochs (number) Detection rate (%) False alarm (%)

Raw-38 2:21:17 1,000 95.58 4.42

PCA-38 2:39:04 1,000 98.95 1.05

PCA-22 2:08:18 1,000 99.26 0.74

GPC-12 0:53:28 1,000 99.47 0.53

GPC-10 0:16:14 1,000 99.51 0.49

Table 6 SVM performance on different feature set

Feature set GPC10 GPC12 PC22 PC38 Raw-38

False alarm 07 11 24 79 11,455

Epochs 1,000 1,000 1,000 1,000 1,000

Time 01:16:14 01:36:01 02:08:18 02:39:04 02:21:17

Features size 564 KB 2.17 MB 5.15 MB 8.37 MB 8.37 MB

False

positive

0 0 24 79 11,455

False

negative

07 11 0 0 0

True

positive

12,807 12,811 12,776 12,721 1,345

True

negative

7,193 7,189 7,224 7,279 18,655

Table 7 Performance comparison with other approaches

Approach(s) Detection rate (%)

(SVM ? GPC-10) [our approach] 99.96

(SVM ? GPC-12) [our approach] 99.94

(PCA ? GA ? SVM) [3] 99.60

(MLP ? PCA) [1] 98.57

(GA ? SVM) [16] 98

(SVM) [21] 83.2

(MLP) [21] 82.5

(PCA ? NN) [22] 92.2

(RBF/Elman) [7] 93

(ART1, ART2, SOM) [23] 97.42, 97.19, 95.74

Table 5 Overall performance of verification phase

Feature set Features True positive True negative Normal (64 %) Intrusive (36 %) False alarm

Raw-38 760,000 1,345 18,655 6.72 93.27 11,455

PCA-38 760,000 12,721 7,279 63.605 36.395 79

PCA-22 440,000 12,776 7,224 63.88 36.12 24

GPC-12 240,000 12,811 7,189 64.055 35.945 11

GPC-10 200,000 12,807 7,193 64.035 35.965 07


123

analysis of applied approach with other approaches. Hence,

the results show that our method enhances SVM perfor-

mance in intrusion detection that outperforms the existing

approaches and has the capability to minimize the number

of features (up to 10) and maximize the detection rates (up

to 99.96 %). Therefore, the adopting of SVM based on

genetic principal components is a feasible solution that

satisfies optimal performance.

9 Conclusion

In this article, the performance of intrusion detection is

improved based on optimal feature subset selection which is

obtained from PCA and GA. The selecting of an appropriate

number of principal components is a critical problem in

subset selection. Therefore, GA is applied to search the

genetic principal components that offered a subset of fea-

tures with optimal sensitivity and the highest discriminatory

power. The KDD cup dataset was used that is a benchmark

for evaluating the security detection mechanisms. The SVM

is used for classification purpose. The performance of

applied approach is addressed. Further, a comparative

analysis is made with existing approaches. Consequently,

this method provides optimal performance in intrusion

detection which is capable to minimize amount of features

and maximize the detection rates.

Acknowledgment The authors extend their appreciation to the

College of Computer & Information Sciences Research Center,

Deanship of Scientific Research, King Saud University, Saudi Arabia

for funding this research work. The authors are grateful for this

support.

References

1. Ahmad I (2011) Feature subset selection in intrusion detection

using soft computing techniques. PhD thesis, Universiti Tekno-

logi Petronas (UTP), Perak, Malaysia

2. Ahmad I (2012) Feature subset selection in intrusion detection.

LAP Lambert Academic Publishing AG & Co, Germany

3. Ahmad I, Abdullah A, Alghamdi A, Hussain M (2011) Optimized

intrusion detection mechanism using soft computing techniques.

Telecommun Syst J. doi:10.1007/s11235-011-9541-1

4. Ahmad I, Abdullah A, Alghamdi A, Hussain M, Nafjan K (2011)

Intrusion detection using feature subset selection based on MLP.

Sci Res Essays 6(34):6804–6810

5. Liu G, Yi Z, Yang S (2007) A hierarchical intrusion detection

model based on the PCA neural networks. Neurocomputing

70(7–9):1561–1568

6. Horng S, Ming-Yang S, Yuan-Hsin C, Tzong-Wann K, Rong-Jian

C, Jui-Lin L, Citra Dwi P (2011) A novel intrusion detection

system based on hierarchical clustering and support vector

machines. Expert Syst Appl 38(1):306–313

7. Tong X, Wang Z, Haining Y (2009) A research using hybrid

RBF/Elman neural networks for intrusion detection system secure

model. Comput Phys Commun 180(10):1795–1801

8. Eid HF, Darwish A, Hassanien AE, Abraham A (2010) Principle

components analysis and support vector machine based intrusion

detection system. In: 10th international conference on intelligent

systems design and applications (ISDA), Cairo, Egypt,

pp 363–367

9. Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003) A com-

parison of PCA, KPCA and ICA for dimensionality reduction in

support vector machine. Neurocomputing 55(1–2):321–336

10. Sun Z, Bebis B, Miller R (2004) Object detection using feature

subset selection. Pattern Recognit 37(11):2165–2176

11. Hussain M, Wajid SK, Elzaart A, Berbar M (2011) A comparison

of SVM kernel functions for breast cancer detection. In: 8th IEEE

international conference on computer graphics, imaging and

visualization (CGIV), pp 145–150

12. Yang S, Bebis G, Hussain M, Muhammad G, Mirza A (2013)

Unsupervised discovery of visual face categories. Int J Artif Intell

Tools 22(01):1250029-1–1250029-30. doi:10.1142/S021821301

2500297

13. Vapnik V (1995) Statistical learning theory. Springer, New York

14. Boser BE,Guyon IM, Vapnik V (1992) A training algorithm for

optimal margin classifiers. In: Proceedings of the 5th annual

workshop on computational learning theory, pp 144–152

15. Burges C (1998) Tutorial on support vector machines for pattern

recognition. Data Min Knowl Discov 2(2):955–974

16. Kim D, Nguyen H, Syng-Yup O, Jong SP (2005) Fusions of GA

and SVM for anomaly detection in intrusion detection system,

advances in neural networks, vol 3498. Lecture Notes in Com-

puter Science, pp 415–420

17. Gao M, Tian J, Xia M (2009) Intrusion detection method based

on classify support vector machine. In: Presented in the pro-

ceedings of the second international conference on intelligent

computation technology and automation. IEEE Computer Soci-

ety, Washington, DC, pp 391–394

18. Ahmad I, Abdullah A, Alghamdi A, Hussain M (2011) Denial of

service attack detection using support vector machine. J Inf

Tokyo 14(1):127–134

19. Ahmad I, Abdullah A, Alghamdi A (2009) Application of arti-

ficial neural network in detection of DOS attacks. In: Proceedings

of the 2nd international conference on security of information and

networks (SIN ’09), Famagusta, North Cyprus. ACM, New York,

pp 229–234

20. Zargar G, Kabiri P(2010) Selection of effective network param-

eters in attacks for intrusion detection, advances in data mining.

Applications and theoretical aspects, vol 6171. Lecture Notes in

Computer Science, pp 643–652

21. Osareh A, Shadgar B (2008) Intrusion detection in computer

networks based on machine learning algorithms. Int J Comput Sci

Netw Secur (IJCSNS) 8(11):15–23

22. Lakhina S, Joseph S, Verma B (2010) Feature reduction using

principal component analysis for effective anomaly-based intru-

sion detection on NSL–KDD. Int J Eng Sci Technol 2(6):

1790–1799

23. Amini M, Jalili R, Shahriari H (2006) RT–UNNID: a practical

solution to real-time network-based intrusion detection using

unsupervised neural networks. Comput Appl Secur 25(6):459–468


123

http://dx.doi.org/10.1007/s11235-011-9541-1

http://dx.doi.org/10.1142/S0218213012500297

http://dx.doi.org/10.1142/S0218213012500297

Download - Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components

Top Related