[ieee 2009 international conference on advances in recent technologies in communication and...

4
ENSEMBLING RULE BASED CLASSIFIERS FOR DETECTING NETWORK INTRUSIONS Mrutyunjaya Panda Department of ECE, Gandhi Institute of Engineering and Technology, Gunupur, Orissa-765022, India [email protected] Manas Ranjan Patra Department of Computer Science, Berhampur University, Orissa, India [email protected] Abstract— An intrusion is defined as a violation of the security policy of the system, and hence, intrusion detection mainly refers to the mechanisms that are developed to detect violations of system security policy. Recently, data mining techniques have gained importance in providing the valuable information which in turn can help to enhance the decision on identifying the intrusions (attacks).In this paper; we evaluate the performance of various rule based classifiers like: JRip, RIDOR, NNge and Decision Table using ensemble approach in order to build an efficient network intrusion detection system. We use KDDCup’99, intrusion detection benchmark dataset (which is a part of DARPA evaluation program) for our experimentation. It can be observed from the results that the proposed approach is accurate in detecting network intrusions, provides low false positive rate, simple, reliable and faster in building an efficient network intrusion system. Keywords— Intrusion Detection, Rule Based Classifiers, Ensemble approach, Accuracy. I. INTRODUCTION With the development of the internet, the information security threat is becoming one of the most crucial problems. Reliable connections, information integrity and privacy are demanded more intensively now-a-days than ever before. One possible precaution is the use of an Intrusion Detection System (IDS). IDS is an effective security technology, which can detect, prevent and possibly react to the attack [1]. It monitors target source of activities, such as audit and network traffic data in computer or network systems, which deploys various techniques in order to provide security services. Therefore, the main objective of IDS is to detect all intrusions in an efficient manner [2]. Approaches for intrusion detection can be divided into two types: anomaly detection and misuse detection. Misuse detection system detects all known types of attacks by looking for predefined attack patterns in system audit traffic. In general, misuse detection system has higher detection accuracy than anomaly detection. The main issue of misuse detection is that it is difficult to find new or unknown attack types. Anomaly detection is based on the assumption that intrusive behaviors deviate greatly from normal system usage. In general, anomaly detection systems first learn a normal system activity profile, and then flag all system events that statistically deviate from the established profile. The advantage of anomaly detection is the ability to identify the novel or “unforeseen” attack types. The main weakness of anomaly detection system is their high false positive rate. Data Mining is a relatively new approach for intrusion detection. Data mining approaches for intrusion detection was first implemented in Mining Audit Data for Automated Models for Intrusion Detection [3]. The raw data is first converted into ASCII network packet information which in turn is converted into connection level information. These connection level information records contain within connection features like service, duration, protocol, etc. Data mining algorithms are applied to this data to create models to detect intrusions. The rest of the paper is organized as follows. The literature review is presented in Section 2 followed by a short theoretical background on the rule based classifiers in Section 3. Ensemble of classifiers used in this research is presented in Section 4. Experimental results and analysis is presented in Section 5 with a brief description about the data set used followed by conclusion and future scope of research in Section 6. II. RELATED RESEARCH The authors propose a neuro-fuzzy technique (NEFCLASS) to reduce false alerts in IDS in [4]. They have used SNORT and JRip rule set in order to compare the effectiveness of their approach. They conclude that with the neuro-fuzzy approach, they could able to reduce the false positive rates, however the detection rate is low in comparison to the rule based classifier. In [5], the authors include a hybrid statistical approach which uses Data Mining and Decision tree classification in identifying the false alarms. In that, the authors concluded that their strategy can be used to evaluate and enhance the capability of the IDS to detect and at the same time to respond to the threats and benign traffic in critical network applications. In [6], the authors propose using support vector classifiers approach to classify network requests. The authors in [7] present two hybrid approaches for modeling IDS. Decision trees and SVM are combined as a hierarchical hybrid intelligent system model (DT-SVM) and an ensemble approach combining the base classifiers. They conclude that the proposed research provides more 2009 International Conference on Advances in Recent Technologies in Communication and Computing 978-0-7695-3845-7/09 $25.00 © 2009 IEEE DOI 10.1109/ARTCom.2009.121 19 2009 International Conference on Advances in Recent Technologies in Communication and Computing 978-0-7695-3845-7/09 $26.00 © 2009 IEEE DOI 10.1109/ARTCom.2009.121 19 2009 International Conference on Advances in Recent Technologies in Communication and Computing 978-0-7695-3845-7/09 $26.00 © 2009 IEEE DOI 10.1109/ARTCom.2009.121 19

Upload: manas-ranjan

Post on 31-Jan-2017

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference

ENSEMBLING RULE BASED CLASSIFIERS FOR DETECTING NETWORK INTRUSIONS

Mrutyunjaya Panda

Department of ECE, Gandhi Institute of Engineering and Technology,

Gunupur, Orissa-765022, India [email protected]

Manas Ranjan Patra

Department of Computer Science, Berhampur University, Orissa, India

[email protected]

Abstract— An intrusion is defined as a violation of the security

policy of the system, and hence, intrusion detection mainly refers to the mechanisms that are developed to detect violations of system security policy. Recently, data mining techniques have gained importance in providing the valuable information which in turn can help to enhance the decision on identifying the intrusions (attacks).In this paper; we evaluate the performance of various rule based classifiers like: JRip, RIDOR, NNge and Decision Table using ensemble approach in order to build an efficient network intrusion detection system. We use KDDCup’99, intrusion detection benchmark dataset (which is a part of DARPA evaluation program) for our experimentation. It can be observed from the results that the proposed approach is accurate in detecting network intrusions, provides low false positive rate, simple, reliable and faster in building an efficient network intrusion system.

Keywords— Intrusion Detection, Rule Based Classifiers, Ensemble approach, Accuracy.

I. INTRODUCTION

With the development of the internet, the information security threat is becoming one of the most crucial problems. Reliable connections, information integrity and privacy are demanded more intensively now-a-days than ever before. One possible precaution is the use of an Intrusion Detection System (IDS). IDS is an effective security technology, which can detect, prevent and possibly react to the attack [1]. It monitors target source of activities, such as audit and network traffic data in computer or network systems, which deploys various techniques in order to provide security services. Therefore, the main objective of IDS is to detect all intrusions in an efficient manner [2]. Approaches for intrusion detection can be divided into two types: anomaly detection and misuse detection. Misuse detection system detects all known types of attacks by looking for predefined attack patterns in system audit traffic. In general, misuse detection system has higher detection accuracy than anomaly detection. The main issue of misuse detection is that it is difficult to find new or unknown attack types. Anomaly detection is based on the assumption that intrusive behaviors deviate greatly from normal system usage. In general, anomaly detection systems first learn a normal system activity profile, and then flag all system

events that statistically deviate from the established profile. The advantage of anomaly detection is the ability to identify the novel or “unforeseen” attack types. The main weakness of anomaly detection system is their high false positive rate. Data Mining is a relatively new approach for intrusion detection. Data mining approaches for intrusion detection was first implemented in Mining Audit Data for Automated Models for Intrusion Detection [3]. The raw data is first converted into ASCII network packet information which in turn is converted into connection level information. These connection level information records contain within connection features like service, duration, protocol, etc. Data mining algorithms are applied to this data to create models to detect intrusions. The rest of the paper is organized as follows. The literature review is presented in Section 2 followed by a short theoretical background on the rule based classifiers in Section 3. Ensemble of classifiers used in this research is presented in Section 4. Experimental results and analysis is presented in Section 5 with a brief description about the data set used followed by conclusion and future scope of research in Section 6.

II. RELATED RESEARCH

The authors propose a neuro-fuzzy technique (NEFCLASS) to reduce false alerts in IDS in [4]. They have used SNORT and JRip rule set in order to compare the effectiveness of their approach. They conclude that with the neuro-fuzzy approach, they could able to reduce the false positive rates, however the detection rate is low in comparison to the rule based classifier. In [5], the authors include a hybrid statistical approach which uses Data Mining and Decision tree classification in identifying the false alarms. In that, the authors concluded that their strategy can be used to evaluate and enhance the capability of the IDS to detect and at the same time to respond to the threats and benign traffic in critical network applications. In [6], the authors propose using support vector classifiers approach to classify network requests. The authors in [7] present two hybrid approaches for modeling IDS. Decision trees and SVM are combined as a hierarchical hybrid intelligent system model (DT-SVM) and an ensemble approach combining the base classifiers. They conclude that the proposed research provides more

2009 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-3845-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.121

19

2009 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.121

19

2009 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.121

19

Page 2: [IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference

accurate intrusion detection systems. Intrusion detection using an ensemble of intelligent paradigms is proposed in [8].In this, the authors show that an ensemble of ANNS, SVMs and MARS is superior to individual approaches for intrusion detection in terms of classification accuracy.

III. RULE BASED CLASSIFIERS In this section, we will focus on some very important and yet novel rule based classification algorithms like NNge, JRip, RIDOR, Decision table(DT), which are not yet explored by intrusion detection researchers to the best of our knowledge. A. NNge (Non-Nested generalized Exemplars) NNge, a novel algorithm that generalizes exemplars without nesting or overlap. NNge is an extension of Nge [9], which performs generalization by merging exemplars, forming hyperrectangles in feature space that represent conjunctive rules with internal disjunction. NNge forms a generalization each time a new example is added to the database, by joining it to its nearest neighbour of the same class. Unlike Nge, it does not allow hyperrectangles to nest or overlap. This is prevented by testing each prospective new generalization to ensure that it does not cover any negative examples, and by modifying any generalizations that are later found to do so. NNge adopts a heuristic that performs this post-processing in a uniform fashion. The more details about this algorithm can be found in [10]. B. JRip (Extended Repeated Incremental Pruning) JRip implements a propositional rule learner, “Repeated Incremental Pruning to Produce Error Reduction” (RIPPER), as proposed in [11]. JRip is a rule learner alike in principle to the commercial rule learner RIPPER. C. RIDOR (Ripple-Down Rules) RIDOR generates the default rule first and then the exceptions for the default rule with the least (weighted) error rate. Later, it generates the best exception rules for each exception and iterates until no exceptions are left. This, it performs a tree-like expansion of exceptions and the leaf has only default rules but no exceptions. The exceptions are a set of rules that predict the improper instances in default rules [12]. Initially, Ripple-Down Rules have been developed for knowledge acquisition and maintenance of rule-based systems. In knowledge acquisition and incremental rule learning, it is often hard to add the existing rules and certify that the adding of a rule will not cause the inconsistency of the rule base, causing the existing rules to perform badly in new classification tasks. As opposed to standard classification rules, induced by using a covering algorithm for rule set construction ripple down rules create exceptions to existing rules, so that the changes are confined to the context of the rule and will not affect other rules. Ripple-Down rules resemble decision lists which induce rules of the form “if-then-else” , as new Ripple down rules

are added by creating except or else branches to the existing rules. If a rule fires but produces an incorrect conclusion then “an except” branch is created for the new rule. If no rule fires then “an else” branch is created for the new rule. D. Decision Tables Decision Tables are one of the possible simplest hypothesis spaces, and usually they are easy to understand. A decision table is an organizational or programming tool for the representation of discrete functions. It can be viewed as a matrix where the upper rows specify sets of conditions and the lower ones sets of actions to be taken when the corresponding conditions are satisfied; thus each column ,called a rule, describes a procedure of the type “if conditions, then actions”. Given an unlabelled instance, table classifier searches for exact matches in the decision table using only the features in the schema (it is to be noted that there may be many matching instances in the table). If no instances are found, the majority class of the decision table is returned; otherwise, the majority class of all matching instances is returned. To build a decision table, the induction algorithm must decide which features to include in the schema and which instances to store in the body. More details can be found in [13].

IV. ENSEMBLE OF CLASSIFIERS

A. AdaBoost Boosting is a general method for improving the accuracy of any given learning algorithm. Boosting refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb. Boosting has its roots in a theoretical framework for studying machine learning called the “PAC” learning model, due to Valiant [14] & Kearns and Vazirani [15], for a good introduction to this model. They were the first to pose the question of whether a “weak” learning algorithm which performs just slightly better than guessing in the PAC model can be “boosted” into an arbitrary accurate “strong” learning algorithm. Finally, the AdaBoost algorithm, introduced by Freund and Schapire [16], solved many of the practical difficulties of the earlier boosting algorithms, and is the focus of this paper. B. Proposed Approach In the case of intrusion detection, our task is to design a classifier, which could give the best accuracy for each category of the attack patterns. The first step is to carefully construct the different connectional models to achieve the best generalization performance for classifiers. We use various base classifiers for the ensemble approach to build a strong classifier in order to build an efficient network intrusion detection model. Test data is then passed through these models and the corresponding outputs are recorded. The approach is presented in Figure 1.

202020

Page 3: [IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference

Figure 1. Proposed Ensemble approach with various base classifiers for NIDS

Table 1. Comparison of Rule based classification methods

V. EXPERIMENTAL RESULTS AND ANALYSIS

The data for our experiments were prepared by the 1999 DARPA intrusion detection evaluation program by MIT Lincoln Laboratory. KDDCup 1999 dataset is a part of the DARPA evaluation program used in this paper. The data set contains 24 attack types that could be classified into four main categories, namely, Probing, Denial of Service (DoS), User to Root (U2R), and Remote to Local (R2L). The original data contain 744 MB data with 4,940,000 records. The data set has 41 attributes for each connection record plus one class label. We have carried out all our experiments on a Pentium-4 IBM PC, 2.8GHz CPU, 40GB HDD with 512MB RAM. The data for our experiments contain 1000 randomly generated records having 6 features such as: Protocol type,

service, flag, source bytes, destination bytes and attack class with all the rare data that fall under U2R and R2L category are completely used. This data set has five different classes, namely Normal, Dos, Probe, U2R and R2L. We perform a five-class classification approach in this paper. The training is done with full data set in order to build the model while 10-fold cross validation is used for the testing the effectiveness of the model built in the training phase. The various performance measures for building an efficient NIDS based on rule based classification algorithms can be obtained from the Table 1. From the above Table, it is evident that NNge provides the best result in detecting the rare attacks that fall under the U2R and R2L attack category out of all other rule based classifiers discussed in the paper. However, Decision Table is found efficient in detecting Normal, Probe, DoS attacks. Ensemble approach enhances

JRip RIDOR NNge Decision Table

Ensemble Approach with Base Classifiers JRip RIDOR NNge Decision

Table DR Normal 0.9835 0.9859 0.9835 0.9859 0.9929 0.986 0.9812 0.993

Probe 0.5625 0.75 0.6562 0.4375 0.6875 0.375 0.6875 0.75 DoS 0.998 1.0 1.0 0.9684 0.998 1.0 0.998 1.0 U2R 0.25 0.0 0.75 0.4444 0.75 0.5 0.75 0.5 R2L 0.353 0.4706 0.647 0.353 0.353 0.412 0.647 0.42

RR Normal 0.9698 0.979 0.9721 0.9188 0.9635 0.9813 0.972 0.979 Probe 0.9 0.75 0.7241 0.8235 0.88 0.8 0.7333 0.857 DoS 0.9656 0.99 0.998 0.9723 0.9883 0.9512 0.996 0.9826 U2R 0.8333 0.8333 0.7273 1.0 0.8571 0.8 0.8 1.0 R2L 0.6666 0.6666 0.9166 0.75 0.75 0.7 0.846 0.78

FPR Normal 0.0126 0.0107 0.0126 0.0124 5.56x10-3 0.01 0.014 5.52x10-3 Probe 0.0142 8.35x10-3 0.0114 0.0182 0.01 0.02 0.01 8.28x10-3 DoS 2.23x10-3 0.0 0.0 0.033 2.09x10-3 0.0 2.08x10-3 0.0 U2R 4x10-3 4x10-3 1.02x10-3 5x10-3 3.04x10-3 5x10-3 1.02x10-3 3.03x10-3 R2L 0.0111 9.15x10-3 6.1x10-3 0.011 0.011 0.01 6.11x10-3 0.01

FNR Normal 0.0302 0.021 0.0279 0.081 0.0365 0.0187 0.028 0.021 Probe 0.1 0.25 0.2759 0.1765 0.12 0.2 0.0266 0.143 DoS 0.0343 9.76x10-3 1.97x10-3 0.0277 0.0117 0.0487 3.94x10-3 0.017 U2R 0.1666 0.1666 0.2727 0.0 0.1428 0.2 0.2 0.0 R2L 0.3333 0.3333 0.0833 0.25 0.25 0.3 0.154 0.222

F-Value Normal 0.9766 0.9824 0.9777 0.9512 0.978 0.984 0.9766 0.986 Probe 0.6923 0.75 0.6885 0.5714 0.772 0.51 0.71 0.8 DoS 0.9815 0.9951 0.999 0.9703 0.993 0.975 0.997 0.9912 U2R 0.6666 0.6666 0.8 0.615 0.75 0.572 0.842 0.803 R2L 0.4616 0.5517 0.7586 0.48 0.48 0.52 0.733 0.546

Kappa 0.9308 0.9477 0.9495 0.8915 0.947 0.9477 0.9442 0.9489 Time Taken in Seconds

0.49 0.72 0.23 0.34 2.66 3.28 0.94 1.53

RMSE 0.0601 0.0537 0.0546 0.0919 0.0502 0.0482 0.0512 0.0481

Data Pre-processor

JRip

RIDOR

NNge

Decision Table

Ensemble method (AdaBoost)

Knowledge Extraction

Base Classifiers

212121

Page 4: [IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference

the performance of JRip, RIDOR and Decision Table classifiers while making no change in case of NNge based classifiers. It can also be observed that NNge also takes have high kappa value and faster in building the model with somewhat more Root mean square error (RMSE) in comparison to Decision Table based classifier.

Table 2. Performance Comparison with existing classification approaches Algorithm Normal

(%) Probe (%)

DoS (%)

U2R (%)

R2L (%)

Hybrid DT +SVM [7]

99.7 98.57 99.92 48.0 37.8

BBN+GLS [20]

98.8 62.5 99.2 78.0 53.0

MultiBoost +SMO [21]

97.88 71.0 99.0 67.0 30.0

ANTIDS-a [18]

69.4 60.07 84.31 47.62 87.63

HGMM [17] 88.14 99.33 99.78 96.01 82.66 LAMSTAR

[19] 99.4 95.6 98.6 31.6 37.7

AdaBoost + NNge (ours)

98.12 68.75 99.8 75.0 64.7

AdaBoost + Decision

Table (ours)

99.3 75.0 1.0 50.0 42.0

It can also be observed from Table 2 that our proposed ensemble approach with NNge as base classifier perform very well in comparison to almost all the existing classifiers presented. However, the detection accuracy of HGMM is more than the proposed classifier, at the cost of more false positive rates. Therefore, it is a compromise between the detection rate and false positive rate. However, care should be taken to have high detection accuracy while maintaining low false positive rate in order to build an IDS.

VI. CONCLUSION AND FUTURE SCOPE In this article, we presented an Ensemble approach to Rule based Classifiers in order to build efficient network intrusion detection model, by combining AdaBoost with various base learners. We have also demonstrated the performance of the proposed approach in comparison with other existing classification algorithms in order to find the efficacy of our proposed model. The results show that our model is able to provide the low false positive rate, fairish detection rate for all types of attack. More data mining techniques should be investigated and their efficiency should be evaluated as intrusion detection models. To achieve this target, we propose to use hybrid algorithms by combining different data mining algorithms as a future direction of research.

REFERENCES [1] Rebecca Base and Peter Mell. NIST special publication on

Intrusion detection system.Infidel Inc., Scotts Valley, CA, National Institute of Standards and Technology,2001.

[2] V.Gowadia, C.Farkas and M.Valtorta. PAID: A probabilistic agent-based IDS. Journal of computers and security,2005.

[3] MIT Lincoln Laboratory. http://www.ll.mit.edu/IST/ideval/ [4] Pravesh Gaonjur, N.Z.Tarapore and S.G.Pukale. Using neuro- fuzzy techniques to reduce false alerts in intrusion detection

system. In: Proc. of International conference on computer networks and Security. VIT University, India. (2008).

[5] N.B. Annur, H.Sallehudin,A.Gani and O.Zakari. Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malaysian journal of computer science. Vol.21,No.2, 2008,pp.101-115.

[6] John Mill and Atsushi Inoue. Support vector classifiers and network intrusion detection. In Proc. of 2004 IEEE Intl.conf. on fuzzy systems. WA,2004,Vol.1,pp.407-410.ISBN:1098-7584.

[7] S.Peddabachigari, A.Abraham, C.Grosan and J.Thomas. Modelling IDS using hybrid intelligent systems. Journal of network and computer applications,Vol.30, No.1, 2007, pp.114-132.Elsevier.

[8] S.Mukkamala, A.H.Sung and A.Abraham. Intrusion detection using an ensemble of intelligent paradigms. Journal of network and computer applications,Vol.28,2005,pp.167-182.Elsevier.

[9] S.Salzberg. A nearest hyperrectangle learning method. Machine learning, Vo.6, pp.277-309.1991.

[10] Sylvain Roy. Nearest neighbour with generalization.Christchurch,NZ,2002.

[11] Willium W.Cohen. Fast effective rule induction. In:12th Intl.conf. on Machine learning,pp.115-123,1995.

[12] Brian R. Gaines and Paul Cronpton. Induction of Ripple-Down rules applied to modelling large databases. Journal if Intelligent information system.Vol.5,No.3, pp.221-228,1995.

[13] Ron Kohavi, The power of decision Tables. In:8th European conference on Machine learning,pp.174-189,1995.

[14] L.G.Valient. A theory of the learnables. Communication of the ACM, Vol.27,No.11,pp.1134-1142,1984.

[15] Michael J.Kearns and Umesh V. Vazirani. An introduction to computational learning theory. MIT Press.1994.

[16] Y.Frund and R.E.Schafire. A decision theoretic generalization of on-line learning and an application to boosting. Journal of computer system science.Vol.55, No.1, Academy Press Inc.,FL, pp.119-139.DOI:10.1006/jcs_1997.1504.

[17]M.Bahrololum and M.Khaleghi. Anomaly IDS using Hierarchical Gaussian Mixture model. International journal of computer science and network security (IJCSNS), Vol.8,August 2008,pp.264-271.

[18] V.Ramos and A.Abraham. ANTIDS: Self organised ANT based clustering for intrusion detection system.pp.1-10.2004. http://www.softcomputing.net/wstst-ra.pdf

[19] V.Venkatachalm and S.Selvan. Performance comparison of Intrusion detection system classifiers using various feature reduction techniques. International journal of simulation.Vol.9, No.1, 2008, pp.30-39. ISSN: 1473-8031.

[20] M.Panda and M.R.Patra. Bayesian Belief Network using genetic local search for intrusion detection. International Journal of secure digital information age (IJSDIA), Vol.1, No.1, pp.30-37. 2009.

[21] M.Panda and M.R.Patra. Boosting support vector classifiers for intrusion detection. In: Proc. of 2009 IEEE International Advance computing conference (IACC-09), Patiala, India, pp.926-931, 2009. IEEE Press, USA.

222222