dataset generation and network intrusion detection...
TRANSCRIPT
DATASET GENERATION AND NETWORK INTRUSION DETECTION BASED
ON FLOW-LEVEL INFORMATION
AHMED ABDALLA MOHAMEDALI ABDALLA
A thesis submitted in fulfilment of the
requirements for the award of the degree of
Doctor of Philosophy (Electrical Engineering)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
17th
SEPTEMBER 2015
iii
To My Mother for all what she gave to me and for what she is still giving.
I dedicated this humble work to you.
iv
ACKNOWLEDGEMENT
First of all, I would like to express my deepest thanks to ALLAH (SWT),
who guided, helped, and supported me.
I wish to send my serious gratefulness and thankfulness to my research
supervisors, Associate Professor Dr. Sulaiman Mohd Nor and Dr. Shaikh Nasir bin
Shaikh Husin for encouragement, motivation, full support in academic, social, and
technical issues. Truly, I have been overwhelmed by their patience, personal
kindness, valuable comments, and advices during my study. Without their
continuous guidance and help, this thesis would not have been completed.
I have greatly appreciated the opportunity from UTM to purse my PhD, I
thank very much all UTM community including each people at (faculties, library,
staff, employees, students and labourers). Special thankfulness to every member of
people in FKE, CICT, SPS.
Finally, I would take pleasure in gratitude my mother, father and all the
members of my family, friends and relatives for their constant dear one, help and
support which motivate me to face the life difficulties.
Ahmed Abdalla
v
ABSTRACT
The growth of the Internet and networking has made securing networks
against attacks a very challenging task. For high-speed networks, flow meta-data
inspection can replace conventional Deep Packet Inspection but with the cost of low
precision of identifying attacks since the former deals with an aggregated version of
the traffic. The first part of this research addresses the problem of the lack in
benchmarking datasets for developing new Network Intrusion Detection Systems
(NIDSs) or comparing existing NIDSs. The aim in the second part is to design a near
real-time NIDS without degrading the detection accuracy when compared to
conventional misuse packet-based approaches. To achieve the first objective, a NIDS
dataset creation framework had been developed. Based on that framework, a flow-
level NIDS dataset had been created. The traces were collected from campus main
routers in NetFlow format while malicious traffic of different attack scenarios was
generated by Nmap and BoNesi tools. In the second part a flow-based software-
based system were developed to detect and identify network volume-level attacks in
near real-time. Attack detection is based on statistical time-aggregated features of the
exported NetFlow version of the traffic to detect several scan and Denial-of-Service
(DoS) attacks. A validation for the designed system is done using Defense Advanced
Research Projects Agency (DARPA) datasets. The timeline performance
outperformed all relevant software-based systems and suited for up to one gigabit per
second links with an average detection delay of less than one minute. The proposed
method achieved 95% True Positive Rate (TPR) and almost zero False Positive Rate
(FPR). Compared to relevant methods when operated in the same conditions, the
proposed method improved the TPR by 4% and improved FPR by 1%. In addition,
the capability of flow-based approach in detecting packet-level attacks was
experimentally demonstrated. The results against Snort were benchmarked and 75%
TPR and almost zero FPR were achieved.
vi
ABSTRAK
Pertumbuhan Internet dan rangkaian telah menjadikan keselamatan rangkaian
terhadap serangan satu tugas yang sangat mencabar. Untuk rangkaian kelajuan
tinggi, penelitian meta-data boleh mengganti penelitian tahap paket konvensional
tetapi dengan kos ketepatan rendah semasa mengenalpasti serangan memandangkan
tahap-aliran berkait dengan agregasi trafik. Bahagian pertama kajian ini bertujuan
menyumbang dalam menyelesaikan masalah keperluan set data penandaan aras yang
piawai untuk membangun Sistem Pengesan Penceroboh Rangkaian (NIDS) baru atau
membanding dengan yang sedia ada. Tujuan di bahagian kedua adalah untuk
merekabentuk NIDS masa nyata berhampiran tanpa merendahkan ketepatan
pengesanan apabila dibandingkan dengan pendekatan konvensional penyalahgunaan
berasaskan paket. Bagi mencapai objektif pertama satu rangka kerja bagi
mewujudkan dataset NIDS telah dibangunkan. Berdasarkan rangka kerja tersebut
dataset tahap-aliran NIDS telah dibangunkan. Surih dikutip dari penghala utama
kampus dalam format NetFlow manakala trafik hasad dari senario serangan berbeza
telah dihasilkan oleh peralatan Nmap dan BoNesi. Dalam bahagian kedua, satu sistem
berasaskan perisian tahap-aliran telah dibangunkan bagi mengesan dan
mengenalpasti serangan rangkaian tahap-jumlah dalam masa nyata berhampiran.
Pengesanan serangan adalah berdasarkan kepada ciri-ciri statistik masa terkumpul
trafik NetFlow versi yang dieksport bagi mengesan beberapa imbasan dan serangan
Penafian Perkhidmatan (DoS). Pengesahan sistem yang direkabentuk dibuat
menggunakan set data Agensi Projek Penyelidikan Termaju Pertahanan (DARPA).
Prestasi had masa sistem telah mengatasi sistem lain berasaskan perisian dan hanya
sesuai sehingga kelajuan satu gigabit sesaat dengan purata lengah kurang dari satu
minit. Kaedah ini berjaya mencapai 95% Kadar Positif Benar (TPR) dan hampir sifar
Kadar Positif Palsu (FPR). Berbanding dengan kaedah berkaitan apabila beroperasi
dalam keadaan yang sama, kaedah ini telah memperbaiki TPR sebanyak 4% dan
memperbaiki FPR sebanyak 1%. Sebagai tambahan, kemampuan kebolehan
pendekatan berasaskan aliran di dalam mengesan serangan tahap paket telah dapat
ditunjukkan secara eksperimen. Keputusan berbanding dengan Snort telah ditanda
aras dan telah mendapat 75% TPR dan hampir sifar FPR.
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xiii
LIST OF FIGURES xv
1 INTRODUCTION 1
1.1 Overview and Motivation 1
1.2 Background 3
1.2.1 A Simple NIDS Architecture 3
1.2.2 IDS Taxonomy 4
1.2.3 NIDS Required Characteristics 5
1.3 Problem Statement 6
1.4 Research Objectives 8
1.5 Research Scope 9
1.6 Significance of the Research 10
1.7 Thesis Organization 11
2 LITERATURE REVIEW 12
2.1 Introduction 12
2.2 Network Attacks Classification 12
2.2.1 Classification According to Attack Type 13
viii
2.2.1.1 Denial of Service (DoS) 14
2.2.1.2 Probing 15
2.2.1.3 Compromises 16
2.2.1.4 Replicators (Viruses/Worms/Trojans) 16
2.2.2 Classification According to Attack Approach 17
2.2.2.1 Brute-force Attacks 18
2.2.2.2 Packet Payload Attacks 18
2.3 Network Intrusion Detection Systems 19
2.3.1 Packet-level Versus Flow-level NIDS 19
2.3.1.1 Packet-level NIDS 20
2.3.1.2 Flow-level NIDS 21
2.4 IP Flows and Evolution of Flow-level Intrusion
Detection
23
2.4.1 Network Flow 23
2.4.2 Flow Export Protocols 25
2.5 Overview of Flow-level NIDS 26
2.6 Overview of Flow-level NIDS Datasets 30
2.6.1 Knowledge Acquisition in NIDSs 33
2.6.2 Classification and Categories of NIDS Datasets 34
2.6.3 Required Characteristics and Challenges of
Ideal NIDS Datasets
36
2.6.3.1 Labeled Dataset 36
2.6.3.2 Realistic Dataset 37
2.6.3.3 Sizable Dataset 39
2.6.3.4 Scalable Dataset 40
2.6.3.5 Complete Capture Traces Dataset 40
2.6.3.6 Publicy Available Dataset 41
2.6.4 Significant NIDS Datasets and Traffic
Captures
42
2.6.4.1 DARPA Family 42
2.6.4.2 KDDcup99 42
2.6.4.3 NSL-KDD 43
2.6.4.4 DEFCON 43
ix
2.6.4.5 CAIDA 43
2.6.4.6 LBNL 44
2.6.4.7 TUIDS 44
2.6.4.8 Summary of NIDS Datasets 45
2.7 Network Traffic Generating, Capturing and
Classification Tools
46
2.7.1 Traffic Capturing and Packet Analyzing Tools 46
2.7.1.1 Tcpdump 46
2.7.1.2 Wireshark 46
2.7.1.3 Gulp 47
2.7.2 Netflow Processing Tools 47
2.7.2.1 Nfsen 47
2.7.2.2 Softflowd 47
2.7.2.3 Nfdump 48
2.7.3 Attack Tools 48
2.7.3.1 Nmap 48
2.7.3.2 Bonesi 48
2.8 Evaluation Measures 49
2.8.1 Accuracy 49
2.8.1.1 Sensetivity and Specificity 50
2.8.1.2 Confusion Matrix 51
2.8.2 Timeliness 54
2.8.3 Attack Identification 54
2.9 Machine Learning in Intrusion Detection 55
2.10 Chapter Summary 57
3 RESEARCH METHODOLOGY 59
3.1 Introduction 59
3.2 Overall Research Methodology Framework 59
3.3 Dataset Creation Framework 63
3.4 Creation of Brute-force Attacks Dataset 62
3.4.1 Specifications of the Dataset 63
3.4.1.1 Malicious Traffic and Attacks in the Dataset 63
x
3.4.2 Testbed and Capturing Plan Setup 65
3.4.3 Features Selection 67
3.5 Flow-level Detection of Brute-force Attacks 68
3.5.1 Design Framework of the Flow-level Brute-
force Attack Detector
68
3.5.2 Time-based Aggregation of Traffic 70
3.5.3 Source-based and Destination-based
Aggregation of Traffic
72
3.5.4 Host Destination-based Aggregation of Traffic 74
3.5.5 Labeling TAT Records 75
3.5.6 Detection Model Design and Profiling 75
3.5.7 Testing and Validating Detection Model 77
3.5.8 Flow-level Detection of Packet Content-based
Attacks
78
3.5.9 Detection of Packet Content-based Attacks
From Flow-level
79
3.5.10 General Framework and Settings 81
3.5.11 Dataset Creation and Labeling 82
3.5.12 Evaluation Efficiency Using Machine Learning
Algorithms
82
3.6 Chapter Summary 83
4 CREATION OF FLOW-LEVEL DATASET FOR
BRUTE-FORCE ATTACKS
84
4.1 Introduction 84
4.2 Creation of Flow-level Dataset 84
4.2.1 Dataset Prerequisites Preparation 85
4.2.1.1 Dataset Characteristics 86
4.2.1.2 Testbed Setup 86
4.2.1.3 Setting Attack Tools and
Attack Host
89
4.2.2 Getting Network Traces 93
4.2.2.1 Flow Exporting 93
xi
4.2.2.2 Flow Data Capturing and Decoding 94
4.2.3 Dataset Labeling 95
4.3 Dataset Statistics 96
4.3.1 Attack-free Dataset Statistics 96
4.3.2 Brute-force Attack-based Dataset Statistics 101
4.4 Dataset Assessment 104
4.4 Summary 107
5 FLOW-LEVEL DETECTION OF NETWORK BRUTE-
FORCE ATTACKS
108
5.1 Introduction 108
5.2 Offline Phase of the Flow-level Brute-force Attack
Detector
108
5.2.1 Time-based Aggregation of Traffic 109
5.2.2 Labeling TAT Records 111
5.2.3 Detection Model Design and Profiling 113
5.2.4 Selection of Significant TAT Attributes 114
5.2.5 TAT Statistics and Profiling 117
5.2.6 Threshold Values and Detection Rules 121
5.4 Validation of the Created Detection Model 127
5.5 Summary 130
6 FLOW-LEVEL DETECTION OF PACKET CONTENT-
BASED ATTACKS
131
6.1 Introduction 131
6.2 General Framework and Settings 131
6.3 Alert Analysis 133
6.4 Dataset Creation and Labeling 136
6.5 Feature Selection 138
6.6 Results and Discusion 139
6.6.1 Full Set Attributes Classification Results 140
6.6.2 Minimum Effective Attributes 141
6.6.3 Most Significant Attributes 142
6.7 Chapter Summary 143
xii
7 CONCLUSION 144
6.1 Introduction 144
6.2 Conclusion 144
6.3 Research Contributions 146
6.4 Suggestions for Future Work 147
REFERENCES 149
xiii
LIST OF TABLES
TABLE NO. TITLE PAGE
2.1 Summarization of works in flow-level network intrusion
detection.
31
2.2 A summary of NIDS dataset. 45
2.3 The four possible outcomes of a classifier. 50
3.1 Specifications of the brute-force attack dataset. 64
3.2 Attack types and attack generating tools. 67
3.3 Version 5 header format. 69
3.4 Version 5 flow record format. 69
3.5 Typical extracted attributes of time window NetFlow
records.
72
3.6 Single detection possibilities. 82
4.1 Characteristics of the brute-force attack dataset. 87
4.2 Capturing periods. 93
4.3 Final ten attributes selected for datasets. 95
4.4 Three digit labeling method of malicious flows. 96
4.5 Attack-free dataset overall and layer 4 statistics. 97
4.6 Brute-force attack-based dataset overall and layer 4
statistics.
102
4.7 Attacks types in the brute-force attack-based dataset. 102
4.8 Characteristics summary of available NIDS datasets and
our dataset.
105
4.9 Superiority of our dataset over other datasets. 106
5.1 Extracted features of time window NetFlow records. 111
5.2 Malicious TAT records statistics. 113
xiv
5.3 1-to-2p flows statistics in our dataset. 116
5.4 TAT features used to detect and identify brute-force
attacks.
117
5.5 Normal TAT records statistics. 118
5.6 Normal TAT records statistics. 121
5.7 Offline testing confusion matrix. 124
5.8 Accuracy results of brute force attack types. 125
5.9 Online testing confusion matrix. 126
5.10 Accuracy results of online testing. 126
5.11 Timeline performance of the online testing. 126
5.12 Information summary of DARPA99. 127
5.13 DARPA99 brute-force detection confusion matrix. 129
5.14 Detection results of DARPA99 brute-force. 129
6.1 First variant Snort alerts analysis. 133
6.2 Second variant Snort alerts analysis. 136
6.3 Final ten attributes selected for datasets. 139
6.4 Dataset derived attributes. 139
6.5 Minimum effective attributes. 142
xv
LIST OF FIGURES
FIGURE NO. TITLE PAGE
1. 1 Simple architecture of NIDS. 3
1. 2 NIDS classification aspects. 5
2. 1 Classification of network attacks. 13
2. 2 Evolution of flow data in network applications. 24
2. 3 Knowledge acquisition process in NIDS. 33
2. 4 Classifications of NIDS datasets. 34
2. 5 Confusion matrix for intrusion detection. 52
2. 6 3-array confusion matrix. 52
2. 7 Transformation from n-array into binary confusion matrix. 53
3. 1 Overall research methodology framework. 60
3. 2 General dataset creation framework. 62
3. 3 Network architecture and testbed for attacks. 66
3. 4 Design framework of the flow-level brute-force detector. 70
3. 5 Aggregation of traffic of different time windows. 73
3. 6 Time-based aggregation of traffic. 76
3. 7 Framework of validating detection model. 77
3. 8 Common attacks detectable by both packet-level and flow-
level inspection.
80
3. 9 General experiment framework. 81
4. 1 Dataset creation implementation flow. 85
4. 2 Network architecture. 88
4. 3 Classes of brute-force attacks in the dataset. 90
4. 4 (a) Part of a script file contains attack commands,
(b) Part of the log file created by the script file.
91
xvi
4. 5 (a) Flow distribution of the attack-free dataset over layer-4
protocol, (b) Byte distribution of the attack-free dataset over
Layer-4 protocols.
98
4. 6 (a) Attack-free dataset flow distribution over layer-5
protocols, (b) Attack-free dataset byte distribution over
significant layer-5 protocols.
99
4. 7 5-min average number of flows of day2 attack-free traffic. 100
4. 8 5-min average number of flows of the 7-day attack-free
traffic.
100
4. 9 (a) Brute-force dataset flow distribution over layer-4
protocols, (b) Brute-force dataset byte distribution over
layer-4 protocols.
101
4. 10 (a) Brute-force dataset flow distribution over layer-5
protocols, (b) Brute-force dataset byte distribution over
layer-5 protocols.
103
4. 11 Percentages of each attack category in brute-force attack
dataset.
104
5. 1 Designing and testing of the flow-level brute-force detector. 109
5. 2 Time-based aggregation of traffic. 110
5. 3 Flow chart of traffic aggregation of traffic. 112
5. 4 Flow chart traffic aggregation of traffic and TAT record
labeling.
114
5. 5 Distribution of malicious TAT records over attack classes. 115
5. 6 Distribution of malicious 60-sec TAT records over attack
classes.
115
5. 7 TAT attributes for normal and network scan TAT (average
values).
119
5. 8 TAT attributes for normal and host scan TAT (average
values).
120
5. 9 TAT attributes for normal and DoS flood TAT (average
values).
120
5. 10 Average values TAT attributes for normal and DDoS flood
TAT.
121
xvii
5. 11 Brute-force attack detecting rules. 122
5. 12 Pseudo code of the brute-force attack detection rules. 123
5. 13 Offline testing of brute-force attack detection model. 124
5. 14 Offline testing confusion matrix. 124
5. 15 Online testing process. 125
5. 16 Online testing confusion matrix. 126
5. 17 Validation of the detection method. 128
5. 18 Normal and malicious TAT records of DARPA99
validating dataset.
129
5. 19 DARPA99 brute-force detection confusion matrix. 129
6. 1 General experiment framework. 132
6. 2 (a) Alert distribution over Snort class type attacks for
dataset1,
(b) Alert distribution over layer 4 protocols for dataset1.
134
6. 3 (a) Alert distribution over Snort class type attacks for
dataset2,
(b) Alert distribution over layer-4 and layer-5 protocols for
dataset2.
135
6. 4 Flow distribution of captured traces over transport layer
protocols.
137
6. 5 Malicious flow distribution over layer-4 and layer-5 in
dataset2.
138
6. 6 Full set attribute ML classification results on dataset1. 140
6. 7 Full set attribute ML classification results on dataset2. 141
6. 8 Most three significant classification results on dataset2. 143
6. 9 Individual impact of each attribute on classification. 143
CHAPTER 1
INTRODUCTION
1.1 Overview and Motivation
The global growth of the Internet and networking has made securing
networks and information one of the most challenging tasks in the field of network
communication. Today, intrusion attacks are generating significant worldwide
epidemic to network security environment and bad impact involving financial loss.
Intruders have the capability to infect thousands of hosts and networks within few
minutes before human action takes place.
Studies on the field by Dell™ SonicWALL™ Threat Research Team
demonstrated that network attacks are increasing exponentially every year. For
example, over 20.1 million unique malware samples were collected in 2013,
compared to 16 million in 2012 (DellSonicWALL, 2013). In the year 2013 more than
1.06 trillion intrusion related incidents had been detected and prevented and more
than 1.78 billion malwares downloads had been blocked. According to the same
report, there were approximately 4,429 new vulnerabilities reported from Common
Vulnerabilities and Exposures (CVE) and 3,644 were related with network attacks in
2013 (DellSonicWALL, 2013). Even a small security breach on commercial dealings
such as online e-payment and money transactions may cause huge unrecoverable
losses to companies and individuals. Successful attacks to hijack network-based
sensitive services in hospitals or airports may lead to disastrous results.
2
In response to those increasing cyber-attacks, the cost of securing networks
information is also increasing. According to the industry survey on cyber security
conducted by New York State Department of Financial Services (Andrew M.
Cuomo, 2014), 77% of institutions involved in the survey experienced an increase in
their total information security budget in the past three years. Almost no institutions
reported a reduction in spending in the past three years.
In order to cope with the enormous increasing security threads that face
network communication, various third party techniques or middle-boxes techniques
have evolved such as firewalls, Network Intrusion Detection Systems (NIDSs).
Network Intrusion Detection Systems (NIDS) proved to be an efficient technique that
can process large volume of networks traffic and detect intrusions in their early
stages in order to limit their catastrophic damages.
Intrusion can be defined as any activity that violates confidentiality,
authority, integrity or availability of information system (Lazarevic et al., 2005).
Although in the field of information security, intrusion and attack are used
interchangeably, there is a little difference between the two terms. An attack is an
intrusion attempt, and an intrusion results from an attack that has been (at least
partially) successful (Barrus and Rowe, 1998). Intrusion Detection is defined by The
National Institute of Standards and Technology (Bace and Mell, 2001) as “the
process of monitoring the events occurring in a computer or network and analyzing
them for signs of intrusions”. An Intrusion Detection System (IDS) can be defined as
a combination of software or/and hardware components that monitors computer and
network systems and raises an alarm when intrusion happens (Lazarevic et al., 2005).
There are two main categories of IDSs; host-based IDSs (HIDS) which are
designed for single host, and network-based IDSs (NIDS) which monitor the whole
traffic of a network. Generally, the concepts of IDS is more related to NIDS than
HIDS since the latter can be considered as an ordinary antivirus system limited to
one host whilst the prior is an entity that can monitor a whole enterprise network or
even an ISP network.
3
1.2 Background
1.2.1. A Simple NIDS Architecture
In general, the components which make up any NIDS can be viewed as
shown in Figure 1.1 (Lazarevic et al., 2005).
Data Acquisition Module – Monitored System
Feature Generation
Module
Incident Detection Module Response Module
Reference Data Module
Event
Raw Data
Actions
Alarms
Figure 1.1 Simple architecture of NIDS (Lazarevic et al., 2005)
The purpose of each module is as follows:
i. Data Acquisition Module: Responsible for collecting network traffic data.
ii. Feature Generation Module: Responsible for extracting a set of selected
traffic features (packet header features, payload features, flow features)
iii. Incident Detection Module: Responsible for identifying intrusion and
generating alarms by comparing the data generated from the feature
generation module with that of the reference module. Generally there are
two methods of this detection; misuse-based detection and anomaly-based
detection.
iv. Response Module: Once an alert is received, this module is responsible for
initiating actions in response to a possible intrusion.
4
v. Reference Data Module: This module contains the reference data that is to be
used by the incident detector to compare with. If the detection method is
misuse-based then this data would be based on the description of all known
intrusion (intrusion database), and if the detection method is anomaly-based
then it would be based on description of normal attack free network
operation (network profile).
1.2.2. IDS Taxonomy
Several taxonomies had been proposed for categorizing NIDS, but the most
common and acceptable one is that one used by Axelsson (2000) and Debar et al.
(1999). This classification categorizes IDSs according to the following aspects:
i. Information source: NIDS can be packet-level based or flow-level. Packet-
level NIDS can be further divided to packet header-based or payload-based.
ii. Analysis Strategy (Detection Method): This categorization distinguishes
NIDS according to the nature of reference data used for identifying
intrusions which is either misuse-based or anomaly-based.
iii. Architecture (Locus of Detection): NIDS can be centralized when it is placed
in single position or distributed when there are several points for
monitoring.
iv. Response to intrusion: NIDS can be either passive when it is just to raise an
alarm on detecting intrusion or active when a further action is to be done in
response to that intrusion.
v. Time Aspects: NIDS can work in real-time and detect intrusions while they
are taking place or by batch (non real-time).
5
Information Source
Analysis Strategy
Time Aspects
Architecture
Response
NIDS
Packet Level
Flow Level
Packet Header
Packet Payload
Anomaly Detection
Misuse Detection
Real-time Prediction
Off-line Prediction
Centralized
Distributed
Passive Response
Active Response
Figure 1.2 NIDS classification aspects
1.2.3. NIDS Required Characteristics
The required NIDS characteristics had been specified by many researchers
(Lazarevic et al., 2005, Catania and Garino, 2012). These characteristics are required
from NIDS to achieve security goals. We can summarize these characteristics as
follows:
i. High Detection Performance: Detection of all attacks without false alarms
ii. Low Processing time: Attack is to be detected as soon as possible
iii. Adaptability: NIDS should be able to readapt itself to deal with novel attacks
and changing environments
iv. Fault tolerance: robustness, resistance to attacks, quick recovery from
successful attacks
v. Minimum Resource consumption: storage resources and processing
capabilities needed.
6
1.3 Problem Statement
As stated in the previous section, NIDS have a wide range of approaches to
be categorized accordingly (Figure 1.2). However, there are two main different
approaches now for NIDS. The conventional and most common implemented
approach nowadays is the misuse-based NIDS with deep packet inspection (DPI),
which tries to detect attacks based on previously prepared database of all known
intrusion signatures. This approach proved to be efficient and accurate for detecting
known intrusions but cannot detect novel and new attacks or even new variants of
known attacks. Another drawback of misuse-based NIDS is that it uses DPI. This
approach of auditing data by inspection inspects every incoming packet in the traffic.
Hence, it cannot cope with high speed networks. Besides, it seems that researches on
misuse-based NIDS have reached a saturation point making it harder to make further
enhancements.
The other approach is the anomaly-based NIDS, which is based on detection
of deviations of a normal attack-free model of the protected network and flagging
these deviations as intrusions. This approach can detect known and novel attacks as
well. Although this latter approach seems promising and in spite of lot of researches
on it, it is still immature and suffers from serious problems that makes it impractical
for real life situations. The major problem of anomaly-based NIDS is that it suffers
from high false alarm rates. This problem arises from the difficulty of specifying
normal and abnormal thresholds for any network and which itself can be considered
as a second problem. A third problem of that approach is the limited capacity and
precision of identifying an attack when an alarm is raised. This problem is a
consequence of anomaly-based detection nature which reacts with any network event
as either normal or abnormal (anomaly) and nothing else. If we consider the fact that
there are lots of network anomalies that are neither attacks nor intrusions, then
definitely there would be a high rate of false alarms.
With respect to the information inspected (audit data), some NIDS inspect the
complete traffic (Deep Packet Inspection) for known attack signatures (byte patterns)
whilst others consider data from packets headers for attacks that violates network
7
protocols rules. Meanwhile, other NIDSs depend on extensive summaries and
aggregations of traffic flow (flow-level) for intrusion that attack network resources.
Each of these levels of audit data inspection concentrates on particular category of
intrusive attacks and, therefore, is strong on some parts but may be poor on other
parts. Since flow-level NIDS deals with aggregations and summarization of traffic, it
can inspect high speed traffic in a near real-time mode. However, it suffers from low
precision of specifying attacks in the traffic. This problem is a result of the absence
of the actual data in the detection and the use of metadata instead. The whole flow is
to be flagged when an attack is detected. The exact malicious packet(s) cannot be
specified nor the exact attack time nor any other further information that can help to
describe the attack.
Any NIDS needs to go through a knowledge acquisition process to build a
reference model in the system before the actual detection process can take place. To
build this reference model, the system must be supplied with previously known
categorized dataset so that the system would acquire the necessary information to
detect attacks and differentiate between malicious and non-malicious incidents in
network traffic. Unfortunately, creation of optimum NIDS dataset that would acquire
all required characteristics is very hard and costly. In addition, some of these
required characteristics seem to be contradicting with each other. Hence, combining
these characteristics in a dataset is almost impossible. There are a number of
challenges and difficulties in creating ideal datasets. For all that, researchers on
NIDS generally suffer from a lack of standard datasets for developing and training
new systems. The problem is even severe when looking for benchmarking common
public datasets for validating, evaluating and comparing between existing systems
and methods.
Most NIDS in the research community today have achieved a very high
degree of accuracy in detecting intrusions. However, the majority of these systems
face difficulties when they are to be deployed and implemented in real world. With
the growing connection speed by the establishment of end users broadband Internet,
the data volumes in the backbone networks has increased steadily. Monitoring of
links that may reach few gigabits per second capacities by NIDS may require more
8
computational resources than that available on commodity computer hardware. This
problem of resource demand is commonly solved by engineering dedicated hardware
platforms such as on Field-Programmable Gate Array (FPGA) that can process
detection tasks more efficiently. However, specially crafted hardware comes at a
significant higher cost and may not be worthwhile in every situation.
Our research problem is divided into two main parts. The first part aims to deal
with the problem of the lack in standard benchmarking datasets for developing and
training new systems or comparing existing ones. The second part tries to find software-
based solutions for flow-level NIDS to achieve accurate near real-time detection. As
a part from the second problem, the research tries to answer some questions
concerning the capability and efficiency of flow-level attributes to detect network
traffic attacks in spite of the absence of actual monitored data, and to find its
efficiency in terms of accuracy and timeline performance.
1.4 Research Objectives
The objectives of this research can be summarized as follows:
1. To create a labeled flow-level dataset for anomaly-based NIDS using a
general framework that would be available publicly.
A labeled flow-level NIDS dataset is a set of flow records with each flow
record labeled as either benign or malicious. A flow record is the set of
attributes that abstracts and summarizes all the information which concerns a
single session between two communicating entities in a network. The dataset
is to satisfy a set of conditions and guidelines in order to acquire validity for
anomaly detection.
2. To design an effective near real-time anomaly-based NIDS to detect
brute-force attacks based only on software solutions.
9
Effective here means the proposed solution will not degrade the detection
accuracy and will not increase the false alarm rate compared to existing
conventional DPI or relevant flow-based methods. The system is supposed to
detect network intrusion attacks when they are taking place not more than few
minutes after the attack has commence. Detection scope of the system consists
of all attacks that consume significant network bandwidth in any phase of the
attack. The detection system is a set of developed software that does not need
special hardware platform to run on and meant to be placed in the
ingress/egress point of Universiti Teknologi Malaysia (UTM) campus
network. Hence, it has to cope with high speed traffic of possibility to few
hundreds megabit per second.
3. To investigate the efficiency of flow-level traffic inspection to detect
packet content-based attacks.
The capability, accuracy and timeline performance of flow-level features to
detect attack contained in payload packets will be investigated and evaluated.
1.5 Research Scope
This research focuses on developing a system that can monitor the traffic of
server farm in UTM campus network in real-time and setting alarm when observing
intrusive activities. Hence, the research deals with traffic bandwidth not exceeding
one gigabit per second, as this is a little higher than the core bandwidth of UTM
campus backbone.
All proposed solutions are software-based and are to be run within general
purpose computer machines that are commercially available in the market. No
special hardware platform such as FPGA would be considered.
Although the study includes misuse detection approach, it will limit
inspection level to flow features and will not consider any of payload inspection or
10
pattern signature as information source for intrusion detection. Inspection of flow-
level data is limited to network and transport layers. Hence, attacks and techniques in
lower layers (physical layers) and upper layers (application layers) will not be
considered.
All proposed solutions and systems are passive with respect to the detected
intrusions. Their role is solely to flag an alarm when any intrusive activity is detected
with no further action to be performed.
The study is about Network IDS. Hence, all techniques reviewed, studied or
analyzed will depend only on network traffic data (packets and flow data). Host IDS
is out of scope of this study. Intrusions would be considered an attack when it uses
network resources to scan, propagate, or penetrate network information resources.
Intrusions affect information sources through other means (e.g. physical access) will
not be considered as network attack.
1.6 Significance of the Research
The most significant contribution of this study is the development of a
general framework to create NIDS datasets. That framework would be available
publicly for research community. This would help other researchers to develop their
own systems and methods and ease the comparison and validations for these new
systems and methods.
Another outcome of the study is a cost effective solution to the problem of
real-time traffic monitoring in any enterprise networks with up to one gigabit per
second bandwidth. It provides a user-friendly NIDS that make the security task easier
and provide an early alarm security system for the monitored network through
developing an efficient NIDS as a step to replace other costly commercial software
products that are currently being used.
11
1.7 Thesis Organization
The rest of this thesis is organized as follows. In Chapter 2, an overview on
the evolution of NIDS and its state-of-the-art that highlights gaps and defines future
trends is given. This is followed by an overview on network traffic generating,
capturing and processing tools and the significant theories and methods used in
NIDS in general. A deep flow-level NIDS literature review is done followed by
another review on NIDS datasets.
Chapter 3 shows the overall research methodology, explains and discusses
general dataset creation framework, the framework of flow-level detection of brute-
force attacks, explains the framework of flow-level detection of packet-level attacks.
It also discusses with details all experimental setups, dataset creation steps,
validation methods and rational constructions behind the selected methodology.
Chapter 4 presents the implementation details of creating brute-force attacked
dataset from captured traffic traces, statistics and various distributions of the created
dataset and discusses the corresponding results.
Chapter 5 presents the implementation details of designing flow-level NIDS
to detect brute-force attacks and all partial and final results concerning evaluation of
accuracy and performance and validation of the detection method and discusses
results.
Chapter 6 presents the implementation details of investigating the efficiency
of flow-level data to detect packet-level attacks and a discussion to justify the results.
Chapter 7 concludes all the study and states the degree of objective
fulfillment, highlights research contributions and makes suggestion for future work.
REFERENCES
Aldous, D. (1991). The Continuum Random Tree I. TAe o/ProMMzYy, 19(1),
1-28.
Allen, J., Christie, A., Fithen, W., McHugh, J., Pickel, J. and Stoner, E. (2000). <SY%?e
o / A Pr^c^/ce o / T^^rMs/o De^ec^/o^ TecA^o/og/es. Technical Report
CMU/SEI-99-TR-028 ESC-99-028. Software Engineering Institute,
CarnegieMellon University, Pittsburgh, Pennsylvania.
Aslam, T. (1995). ^ Tbxo^owy o / ^ecMr/(y F^M/^s m ^Ae (T /% Oper^^/^g
^ys^ew.Doctoral Dissertation.Department of Computer Sciences,Purdue
University, Lafayette, Indiana.
Attanasio, C. R., Markstein, P. W. and Phillips, R. J. (1976). Penetrating an
Operating System: AStudy of VM/370 Integrity. TFM ^ys^ews JoMr^^/, 15,
102-116.
Axelsson, S. (2000a). The Base-rate Fallacy and the Difficulty of Intrusion
Detection. Tm%y%c%o%y oM T^/orw^^/oM ^^J ^ys^ew ^ecMr/(y, 3, 186
205.
Axelsson, S. (2000b). T ^ rM so De^ec^/o^ ^ys^ews/ ^ <SMrvey ^^ J
Thxo^owy.Technical Report 99-15, Department of Computer Engineering,
Chalmers University of Technology, Goteborg, Sweden.
Bace, R. and Mell, P. (2001). AZST <Spec/%/ PMN/c%?/oM oM T^rMs/o^ De^ec^/o^
^ys^ews.Technical ReportSP 800-31, National Institute of Standards and
Technology, Gaithersburg, Maryland.
Barrus, J. and Rowe, N. C. (1998) A Distributed Autonomous-agent Network
Intrusion Detection and Response System. ProceeJmgs o / ?Ae Coww%^J %^J
Co^^ro/ ^ese%rcA ^^J TecA^o/ogy ^y^os/Mw,Naval Postgraduate School
Monterey,USA.
150
Bhuyan, M. H., Bhattacharyya, D. and Kalita, J. K. (2014). Network Anomaly
Detection: Methods, Systems and Tools. TEEE CowwM^/c^^/o^s -Surveys ^^J
TM^or/ /s, 16(1), 303-336.
Bodt, B. A. (2002). Computer Intrusion Detection and Network Monitoring.
EecA^owe^r/cs, 44(3), 294-295.
Breiman, L. (2001). Random Forests. M%cAme Ee^r^/^g, 45(1), 5-32.
CAIDA. (2015). Ce^^er /o r ^_pp//eJ T^^er^e^ ^^^/ys/s, [Online]. Available:
http://www.caida.org/home/.
Catania, C. A. and Garino, C. G. (2012). Automatic Network Intrusion Detection:
Current Techniques and Open Issues. CowpM^ers %^J E/ec?r/c%/ E^gmeermg,
38(5), 1062-1072.
Cheng, CM ., Kung, H. R. and Tan, K.S. (2002). Use of Spectral Analysis in Defense
Against DoS Attacks. ProceeJmgs o / TEEE G/oA%/ Ee/ecowwM^/c^^/o^s
Co^/ere^ce, 3,2143-2148.
Cisco. (2013). C/sco NeE/ow [Online]. Available:
http://www.cisco.com/en/US/products/ps6601/products ios protocol group
home.html.
Cisco. (2015). <SWOPr [Online]. Available: https://www.snort.org/.
Cisco. (2015). NeE/ow Export D^^^gr^w Eorw^^ [Online]. Available:
http://www.cisco.com/c/en/us/td/docs/net mgmt/netflow collection engine/3
-6/user/guide/format.html.
Claise, B. (2008). <Spec//?c%?o o / Ae TP //ow T^/orw%?o Export ^TPETX) Pro^oco/
/o r A ExcA^^ge o / TP Trq///c E/ow T^/orw^^/o^. RFC 7011, IETF.Available:
https://tools.ietf.org/html/rfc7011.
Computer Knowledge. (2015). CowpM^er X^ow/eJge- F/rMs TM^or/^/ [Online].
Available: http://www.cknow.com/ cms/vtutor/cknow-virus-tutorial.html.
Copeland III, J. A. (2007). Flow-based Detection of Network Intrusions. U.S. Patent
No. 7,185,368. Washington, DC: U.S. Patent and Trademark Office.
Dagon, D., Gu, G., Lee, C. P. and Lee, W. (2007). A Taxonomy of Botnet Structures.
ProceeJmgs o / A ^MMM%/ CowpM^er <S*ecMr/(y ^_pp//c%Ao%y Co^/ere^ce, 2J,
325-339.
Debar, H., Dacier, M. and Wespi, A. (1999). Towards a Taxonomy of Intrusion-
Detection Systems. CowpM^er NeWorAs, 31, 805-822.
151
DellSonicWALL (2013). Dell Network Security Threat Report 2013. US-TD593-
20140123. U.S.A:
Dell.Available:http://marketing.sonicwall.com/documents/dell-network-
security-threat-report-2013 -whitepaper-30197.pdf
Denning, D. E. (1987). An IntrusionDetection Model. TEEE Tm%y%c?/o%y oM
<Sb/?w%re FMg/Meer/Mg, 13(2), 222-232.
Diadem Firewall European Project. (2015). DrnJew F/rew%//[Online]. Available:
http://www.diadem-firewall org/index.php.
Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J. and Tan, P.N. (2002).
Data Mining for Network Intrusion Detection. ProceeJ/Mgs o / NSF
^ForAsAop oM Nex? GeMer^?/oM D^?^ M/M/Mg, 21-30.
Dreger, H., Feldmann, A., Paxson, V. and Sommer, R. (2004). Operational
Experiences with High Volume Network Intrusion Detection. ProceeJ/Mgs o /
?Ae CoM/ereMce oM Co^M ?er ^MJ CowwMM/c%?/oMs ^ecMr/(y. ACM, 11, 2-11.
Erbacher, R. F. (2001). Visual Behavior Characterization for Intrusion Detection in
Large Scale Systems. ProceeJ/Mgs o / ?Ae 7M?erM%?/oM%/ CoM/ereMce oM
F/sM%//z%?/oM, Vw^g/Mg, ^MJ Tw^ge Process/Mg, 54-59.
Fide, S. and Jenks, S. (2006). ^ <SMrvey o / <SYr/Mg M3?cA/Mg ppro^cAes /M N^rJw^re.
Technical ReportTR SPDS 06-01, Department of Electrical Engineering and
Computer Science, University of California, Irvine, USA.
Fioreze, T., Oude Wolbers, M., van de Meent, R. and Pras, A. (2007). Finding
Elephant Flows for Optical Networks. ProceeJ/Mgs o / ?Ae 70?A TFVP/TEFF
7M?erM%?/oM%/ ^y^os/M w oM VM?egr^?eJ Ne?wor^ M%M%geweM?, 627-640.
Fisher, D. H., Pazzani, M. J. and Langley, P. (E ds).(2014). CoMcep? Forw%?/oM.*
XMow/eJge %MJ Exper/eMce /M ^MsMperv/seJ Le^rM/Mg.San Mateo: Morgan
Kaufmann.
Frank, J. (1994). Artificial Intelligence and Intrusion Detection: Current and Future
Directions. ProceeJ/Mgs o / ?Ae 77?A N3?/oM^/ Co^M ?er <S*ecMr/?y CoM/ereMce,
10, 1-12.
Galtsev, A. A. and Sukhov, A. M. (2011). Network Attack Detection at Flow Level.
ProceeJ/Mgs o / ?Ae 77?A VM?erM ?/oM / CoM/ereMce ^MJ 4?A TM?erM ?/oM /
CoM/ereMce oM ^w^r? <Sp%ces ^MJ Nex? GeMer^?/oM ^ /reJ /^ /re /ess
Ne?worA/Mg, 326-334.
152
Gao, M., Zhang, K. and Lu, J. (2006). Efficient Packet Matching for Gigabit
Network Intrusion Detection Using TCAMs. ProceeJz^g^ o / ?Ae 20YA
V^Yer^^Yzo^ f Co^/ere^ce o / ^Jv^^ceJ V^/orw^Yzo NeYworAz^g ^^J
_ppfzc%?zo%s', 249-254.
Gao, Y., Li, Z. and Chen, Y. (2006). A DoSResilient Flow-level Intrusion Detection
Approach for High-speed Networks. ProceeJz^g^ o/2dYAVEEE V^Yer^^Yzo^ f
Co^/ere^ceo^ Dz Yrz MYeJ CowpMYz^g -S^ew.s', 39-39.
Garner, S. R. (1995). Weka: The Waikato Environment for Knowledge Analysis.
ProceeJz^g^ o / YAe New Ze^f^^J Co^MYer Scze^ce ^e^e^rcA SYMJe Y
Co^/ere^ce, 57-64.
Gates, C., McNutt, J. J., Kadane, J. B. and Kellner, M. I. (2006). Scan Detection on
very Large Networks Using Logistic Regression Modeling. ProceeJz^g^ o /
YAe77YA TEEE Sy^o^zMw o^ Co^MYer^ ^^ J CowwM^z'c^^z'o^ , 402-408.
Ghorbani, A. A., Lu, W. and Tavallaee, M. (2009). NeYworA V YrM zo DeYecYzo ^ ^ J
Preve^Yzo^/ Co^cepY^ ^^J TecA^z^Me , New York:Springer.
Gil, T. M. and Poletto, M. (2001). MULTOPS: A DataStructure for Bandwidth
Attack Detection. ProceeJz^g^ o / 70YA L'SENVZ SecMrzYy Sy^o^z'Mw,23-38
GitHub. (2015). FoNeSz - YAe DDoS FoY eY Sz^Mf^Yor [Online]. Available:
https://github.com/markus-go/bonesi.
Gogoi, P., Bhuyan, M. H., Bhattacharyya, D. and Kalita, J. K. (2012). Packet and
Flow Based Network Intrusion Dataset. ProceeJm g o / YAe JYA V^Yer^^Yzo^ f
Co^/ere^ce o^ Co^Ye^or^ry Co^MYz^g, Springer, 322-334.
Andrew M. Cuomo, Lawsky,B. M. (2014). ^eporY o^ Cy^er SecMrzYy z YAe F^^Az^g
Sector, Department of Financial Services,New York
State,USA.Available:http://www.dfs.ny.gov/about/press2014/pr140505_cyber_sec
urity.pdf
Hansman, S. and Hunt, R. (2005). A Taxonomy of Network and Computer Attacks.
CowpMYers* %^J SecMrzYy, 24(1), 31-43.
Hofmeyr, S. A. and Forrest, S. (1999). VwwM^ofogzc^f ^ o J e f o / Dz Yrz MYeJ
DeYecYzo ^^J z'Y ^ppfzc^Y o Yo Co^MYer SecMrzYy.
DoctoralDissertation,Department of Computer Science,University of New
Mexico, New Mexico.
153
Hoque, N., Bhuyan, M. H., Baishya, R. C., Bhattacharyya, D. and Kalita, J. K.
(2014). Network Attacks: Taxonomy, Tools and Systems. JoMr^^f o/NeYworA
%^J CowpMYer ppfz'c^Yz'o^y, 40, 307-324.
Hu, W., Liao, Y. and Vemuri, V. R. (2003). Robust Anomaly Detection Using
Support Vector Machines. ProceeJmgy o / YAe V^Yer^^Yz'o^ f Co^/ere^ce o^
M^cAz'^e Le^r^z'^g, 282-289.
Igure, V. and Williams, R. (2008). Taxonomies of Attacks and Vulnerabilities in
Computer Systems. VEEECowwM^z'c^Yz'o^y SMrveyy %^J TMYorz'%fy, 10(1), 6
19.
Information Security Center of eXcellence. (2015). VSCZ D^Y^yeY [Online].
Available: http://www.iscx.ca/datasets.
irchelp.org. (2015). Tro/%^ Norye ^YY cAy [Online]. Available:
http://www.irchelp.ora/irchelp/security/trojan.html.
Jones, A. K. and Sielken, R. S. (2000). CowpMYer SyyYew V YrMyz'o DeYecYz'o / ^
SMrvey. Technical Report, Computer Science Department, University of
Virginia, Charlottesville.
Joshi, M. V., Agarwal, R. C. and Kumar, V. (2002). Predicting Rare Classes: Can
Boosting Make any Weak Learner Strong? ProceeJmgy o / YAe ezgAYA ^C M
V^Yer^^Yz'o^ f Co^/ere^ce o^ X^owfeJge Dzycovery ^^ J D%Y% Mz'^z'^g, 297
306.
Jung, J., Paxson, V., Berger, A. W. and Balakrishnan, H. (2004). Fast Portscan
Detection Using Sequential Hypothesis Testing. VEEE SywpoyzMw o^
SecMrzYy %^J Prz'v^cy, 211-225.
Kashyap, H. J. and Bhattacharyya, D. (2012). A DDoS Attack Detection Mechanism
Based on Protocol Specific Traffic Features. ProceeJmgy o / YAe Seco^J
V^Yer^^Yz'o^ f Co^/ere^ce o^ Co^MY^Yz'o^^f Scz'e^ce, E^gmeermg ^^J
V /orw^Yz'o TecA^ofogy. ACM, 194-200.
Ke-xin, Y. and Jian-qi, Z. (2011). A Novel DoS Detection Mechanism. ProceeJmgy
o / YAe V^Yer^^Yz'o^ f Co^/ere^ce o^ MecA^Yro^z'c Scz'e^ce, EfecYrzc
E^gz'^eerz'^g ^ ^ J Co^MYer, 296-298.
Kendall, K. (1999). ^ D^Y^^^ye O / Co^MYer ^YY cAy For TAe Ev%fM%Yo O /
V YrMyz'o DeYecYz'o SyyYewy.Master's Thesis, Department of Electrical
Engineering and Computer Science,Massachusetts Institute of Technology,
Cambridge.
154
Kim, M.-S., Kong, H.-J., Hong, S.-C., Chung, S.-H. and Hong, J. W. (2004). A
Flow-based Method for Abnormal Network Traffic Detection. ProceeJmgs
/ TEEE/EPNeworA Oper%?/o%y ^^J M^^^gewe^? ^y^os/M w, 599-612.
Kotsiantis, S., Zaharakis, I. and Pintelas, P. (2007). Supervised Machine Learning: A
Review of Classification Techniques. E ro^/ers m ^r/?c/% / T^e///ge^ce %^J
^pp//c^?/o^s, 160, 3-24.
Kotsokalis, C., Kalogeras, D. and Maglaris, B. (2001). Router-based Detection of
DoS and DDoS Attacks. E/gA?A ^rAsAop o / ?Ae H P Ope^F/ew Mwers/zy
^ssoc/^^/o^, Fer//^, Gerw^^y.
Kruegel, C. and Toth, T. (2000). ^ <SMrvey O^ T^^rMs/o De^ec^/o^ ^ys^ews.
Technical Report TUV-1841-00-11, University of Technology,
Vienna,Austria.
Kumar, S. (1995). C/%ss'//?c%?/oM ^^ J De^ec^/o^ o / CowpM?er T^^rMs/o^s.Doctoral
Dissertation, Department of Computer Sciences,Purdue University, Lafayette,
Indiana.
Lai, H., Cai, S., Huang, H., Xie, J. and Li, H. (2004). A Parallel Intrusion Detection
System For High-speed Networks. ProceeJmgs o / ?Ae ^eco^J T^?er^^?/o^^/
Co^/ere^ce o^ ^_pp//eJ Cryp^ogr^pAy ^^ J Ne^worA ecMrzYy, 439-451.
Lakhina, A., Crovella, M. and Diot, C. (2004a). Characterization of Network-wide
Anomalies in Traffic Flows. ProceeJmgs o / ?Ae 4?A ^C M ^TGCOMM
co^/ere^ce o^ T^er^e? Me^sMrewe^?, 201-206.
Lakhina, A., Crovella, M. and Diot, C. (2004b). Diagnosing Network-wide Traffic
Anomalies. ProceeJmgs o / ?Ae ^C M ^TGCOMM co^/ere^ce o^ Co^M ^er
CowwM^/c^?/o^ Pev/ew, 219-230.
Lakhina, A., Crovella, M. and Diot, C. (2005). Mining anomalies Using Traffic
Feature Distributions. ProceeJmgs o / ?Ae ^C M ^TGCOMM co^/ere^ce o^
Co^M ^er CowwM^/c^?/o^ Pev/ew, 217-228.
Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E. D. and Taft, N.
(2004c). Structural Analysis of Network Traffic Flows.^CM
<S7gwe?r/cs,32(1), 61-72.
Larochelle, D. and Evans, D. (2001). Statically Detecting Likely Buffer Overflow
Vulnerabilities. ProceeJmgs o / ?Ae TO/AMENS" ecMrzYy ^y^os/M w, 32,
177-190.
155
Lau, F., Rubin, S. H., Smith, M. H. and Trajkovic, L. (2000). Distributed Denial of
Service Attacks. ProceeJ/Mgs o / ?Ae TEEE VM?erM ?/oM / CoM/ereMce oM
^ys?ews, M%M, ^MJ Cy^erMe?/cs,3, 2275-2280.
Lawrence Berkeley National Laboratory and ICSI. (2005). ERNE/TCST EM?erpr/se
Tr^c/Mg Pro/ec? [Online]. Available: http://www.icir.org/enterprise-tracina/.
Lazarevic, A., Kumar, V. and Srivastava, J. (2005). Intrusion Detection: A Survey.
M%M%g/Mg Cy^er TAre%?s, 19-78.
Lee, W., Wang, C. and Dagon, D. (2007). Fo?Me? De?ec?/oM/ CoMM?er/Mg ?Ae Larges?
^ecMr/?y TAre^?.New York: Springer.
Lee, W. and Xiang, D. (2001). Information-theoretic Measures for Anomaly
Detection. ProceeJ/Mgs o / ?Ae TEEE ^y^os/M w oM ^ecMr/?y ^MJ Prrv%cy,
130-143.
Li, B., Springer, J., Bebis, G. and Gunes, M. H. (2013). A Survey of Network Flow
Applications. JoMrM%/ o/*Ne?wor^ %MJ CowpM?er pp//c^?/oMs, 36, 567-581.
Li, Z., Gao, Y. and Chen, Y. (2005). Towards a High-speed Router-based
Anomaly/Intrusion Detection System.ProceeJ/Mgs o / ?Ae ^pec/^/ TM?eres?
GroMp oM D^?^ CowwMM/c^?/oM , 15-16.
Lincoln Laboratory, M. I. o. T. (2000). D ^ P ^ TM?rMs/oM De?ec?/oM
Ev^/M^?/oM[Online]. Available:
http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideval/data
/2000data.html.
Lindqvist, U. and Jonsson, E. (1997). How to Systematically Classify Computer
Security Intrusions. ProceeJ/Mgs o / ?Ae TEEE ^y^os/M w oM ^ecMr/?y %MJ
Pr/'v^cy, 154-163.
Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., McClung, D.,
Weber, D., Webster, S. E., Wyschogrod, D. and Cunningham, R. K. (1998).
Evaluating Intrusion Detection Systems: The 1998 DARPA Offline Intrusion
Detection Evaluation. TAe TM?erM%?/oM%/ JoMrM%/ o / CowpM?er ^MJ
ye/ecowwMM/'c^?/'oMs Ne?worA/Mg,34, 579-595.
Long, N. and Thomas, R. (2001). Trends In Denial Of Service Attack Technology.
CE^T CoorJ/M%?/oM CeM?er. [Online]. Available:
http://resources.sei.cmu.edu/asset_files/WhitePaper/2001_019_001_52491.pdf
156
Lough, D. L. (2001). ^ Tkxo^owy o / C o^M /er //^cAs w//A ^pp//c^//o^s /o ^/re/ess
Ne/worAs.Doctoral Dissertation, Faculty of the Virginia Polytechnic Institute
and State University, Blacksburg, Virginia.
157
Makhtar, M., Neagu, D. C. and Ridley, M. J. (2011). Comparing Multi-class
Classifiers: on the Similarity of Confusion Matrices for Predictive
ToxicologyApplications.ProceeJmgy o / YAe VEEE 72YAV^Yer^^Yz'o^ f
Co^/ere^ce o^ V Yeffz'ge Y D%Y% E^gz'^eerz'^g ^^J ^MYow^YeJ Le^r^z'^g, 252
261.
McHugh, J. (2000). Testing Intrusion Detection Systems: A Critique of the 1998 and
1999 DARPA Intrusion Detection System Evaluations as Performed by
Lincoln Laboratory. ^C M Yr% y%cYz'o y o^ V /orw^Yz'o %^J yyyYew SecMrz'Yy,
3, 262-294.
Mell, P., Hu, V., Lippmann, R., Haines, J. and Zissman, M. (2003). An Overview of
Issues in Testing Intrusion Detection Systems. Technical Report NIST
Interagency Reports 7007, Department of Commerce, National Institute of
Standards and Technology, USA.
Mirkovic, J., Prier, G. and Reiher, P. (2002). Attacking DDoS at the Source.
ProceeJwgy o / YAe 70YA VEEE V^Yer^^Yz'o^ f Co^/ere^ce o^ NeYworA
ProYocofy, 312-321.
Mirkovic, J. and Reiher, P. (2004). A taxonomy of DDoS Attack and DDoS Defense
Mechanisms. ^C M SVGCOMM Co^MYer CowwM^z'c^Yz'o ^evz'ew, 34, 39-53.
Moore, D., Shannon, C., Brown, D. J., Voelker, G. M. and Savage, S. (2006).
Inferring Internet Denial ofService Activity. ^C M Tm^y^cYz'o^y o^ CowpMYer
SyyYewy, 24, 115-139.
Morin, B. and Me, L. (2007). Intrusion Detection and Virology: An Analysis of
Differences, Similarities and Complementariness. JoMr^^f m cowpMYer
vz'rofogy 3, 39-49.
Munz, G. and Carle, G. (2007). Real-time Analysis of Flow Data for Network Attack
Detection. ProceeJmgy o / YAe 70YA VETP/TEEE V^Yer^^Yz'o^ f Sy^oyzMw o^
V^Yegr^YeJ NeYworA M^^^gewe^Y, 100-108.
Munz, G., Weber, N. and Carle, G. (2007). Signature Detection in Sampled Packets.
^ForAsAop o^ Mo^z'Yormg ^YY cA DeYecYz'o ^^ J Mz'Yzg%Yz'o , Citeseer,
Toulouse, France.
Muraleedharan, N., Parmar, A. and Kumar, M. (2010). A Flow based Anomaly
Detection System Using Chi-square Technique. ProceeJmgy o / YAe2"J
V^Yer^^Yz'o^ f VEEECo^/ere^ce o^ ^Jv%^ce Co^MYz'^g, 285-289.
158
Nguyen, H. A., Tam Van Nguyen, T., Kim, D. I. and Choi, D. (2008). Network
Traffic Anomalies Detection and Identification with Flow Monitoring.
ProceeJmgs / /Ae J /ATEE/TETP TM/em%//oM%/ Co^/ere^ce o^ ^/re/ess ^^J
Op//c%/ CowwM^/c^//o^s Ne/worAs, 1-5.
NLS-KDD. (2015). TAe NSE-XDD D^/^ Se/ [Online]. Available:
http://iscx.cs.unb.ca/NSL-KDD/.
nmap.org. (2015). Nw%p SecMr//y Sc^^^er [Online]. Available: http://nmap.org/.
Park, J., Tyan, H.-R. and Kuo, C.-C. (2006). Internet Traffic Classification for
Scalable QoS Provision. ProceeJmgs / /Ae T^/er^^//o^^/ TEEE Co^/ere^ce
o^ MM///weJ/^ ^^ J Expo, 1221-1224.
Park, K. and Lee, H. (2001). On the Effectiveness of Route-based Packet Filtering
for Distributed DoS Attack Prevention in Power-law Internets. ^C M
STGCOMM CowpM/er CowwMM/c%/o Pev/ew,31(4), 15-26.
Paxson, V. (1999). Bro: A System for Detecting Network Intruders in Real-time.
CowpM/er Ne/worAs, 31, 2435-2463.
Portnoy, L., Eskin, E. and Stolfo, S. (2001). Intrusion Detection with Unlabeled Data
Using Clustering. ProceeJwgs / C M CSS ^orAsAop o^ D^/^ M/^/^g
^_pp//eJ /o SecMr//y, Citeseer.
Quinlan, J. R. (1993). C4. J/ progr^ws /o r w%cAme /e^r^/^g, San Francisco: Morgan
Kaufmann.
Quittek, J., Zseby, T., Claise, B. and Zander, S. (2004). Requirements for IP Flow
Information Export (IPFIX). RFC 3917, IETF. Available: https://www.rfc-
editor.org/rfc/rfc3917.txt
Reich, Y. and Fenves, S. J. (1991). The Formation and Use of Abstract Concepts in
Design. Co^cep/ /orw%//oM.' A^ow/eJge %^J exper/e^ce m M^sMperv/seJ
/e^r^/^g. San Mateo: Morgan Kaufmann.
Rhodes, B. C., Mahaffey, J. A. and Cannady, J. D. (2000). Multiple Self-organizing
Maps for Intrusion Detection. ProceeJmgs / /Ae 2JrJN2//o^%/ T^/orw%/o
Sys/ews SecMr//y Co^/ere^ce, 16-19.
Robertson, S., Siegel, E. V., Miller, M. and Stolfo, S. J. (2003). Surveillance
Detection In High Bandwidth Environments. ProceeJmgs o / /Ae D ^P P ^
T^/orw^//o^ SMrv/v^^////y Co^/ere^ce %^J Expos///o^, 130-138.
159
Roesch, M. (1999). Snort Lightweight Intrusion Detection for Networks.
ProceeJ/Mgs o / ?Ae TJ?A ^SENVZ coM/ereMce oM Sys?ew ^Jw/M/s?r^?/oM, 229
238.
Satten, C. (2008). Loss/ess G/g%M ^ewo?e P^c^e? C^p?Mre ^/?A L/MMx [Online].
Available: http://staf^'.washinaton.edu/corey/gulp/.
Schaffrath, G. and Stiller, B. (2008). Conceptual Integration of Flow-based and
Packet-based Network Intrusion Detection. ^es///eM? NeWorAs* %MJ Serv/ces.
Springer.ProceeJ/Mgs o / ?Ae2MJ TM?erM ?/oM / CoM/ereMce oM ^M?oMowoMs
VM/r s?rMc?Mre, M^M^geweM? ^MJ SecMr/?y, 190-194.
Shannon, C. E. (2001). A Mathematical Theory Of Communication. ^C M
STGMOFTLE M oMe Co^M?/Mg ^MJ CowwMM/c^?/oMs ^ev/ew, 5, 3-55.
Shiravi, A., Shiravi, H., Tavallaee, M. and Ghorbani, A. A. (2012). Toward
Developing a Systematic Approach to Generate Benchmark Datasets for
Intrusion Detection. CowpM?ers %MJ SecMr/?y, 31, 357-374.
Sinclair, C., Pierce, L. and Matzner, S. (1999). An Application of Machine Learning
to Network Intrusion Detection. ProceeJ/Mgs o / ?AeTJ?ATEEE ^MMM%/
CoM/ereMce oM CowpM?er SecMr/?y pp//c^?/oMs, 371-377.
Snort Incorporation. (2013). SNO^T ^sers M%MM%/ 2.P.J [Online]. Available:
http://manual.snort.org/.
Softflowd. (2013). So/?/7owJ [Online]. Available:
http://www.mindrot.org/projects/softflowd/.
Sommer, R. and Paxson, V. (2010). Outside the Closed World: On Using Machine
Learning for Network Intrusion Detection. ProceeJ/Mgs o / ?AeTEEE
Sywpos/Mw oM SecMr/?y ^MJ Pr/v^cy, 305-316.
sourceforge.net. (2014). NFD^M P [Online]. Available:
http://nfdump.sourceforge.net/.
sourceforge.net. (2015). N/SeM - Ne?/7ow SeMsor [Online]. Available:
http://nfsen.sourceforge.net/.
Sperotto, A., Sadre, R., van Vliet, F. and Pras, A. (2009). A Labeled Dataset for
Flow-based Intrusion Detection. ProceeJ/Mgs o / ?Ae 9?A TEEE TM?erM ?/oM /
^ForAsAop oM TP Oper%?/oMs ^MJ M^M^geweM?,39-50.
Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A. and Stiller, B. (2010).
An Overview of IP Flow-based Intrusion Detection. TEEE CowwMM/c^?/oMs
-Surveys %MJ TM?or/^/s, 12(3), 343-356.
160
Staniford, S., Hoagland, J. A. and McAlerney, J. M. (2002). Practical Automated
Detection of Stealthy Portscans. JoMm%/ / CowpM/er SecMr//y, 10, 105-136.
Stoecklin, M. P., Le Boudec, J.-Y. and Kind, A. (2008). A Two-layered Anomaly
Detection Technique Based on Multi-modal Flow Behavior Models.
ProceeJrngs / /Ae P/A T^/er^^//o^^/Co^/ere^ce o^ P^ss/ve %^J ^c//ve
Ne/worA Me^sMrewe^/, 212-221.
Stolfo, S. J., Fan, W., Lee, W., Prodromidis, A. and Chan, P. K. (2000). Cost-based
Modeling for Fraud and Intrusion Detection: Results from the JAM Project.
ProceeJrngs / /AeD^PP^ T^/orw^//o^ SMrv/v^^////y Co^/ere^ce ^^J
Expos///o^, 130-144.
Stone, R. CenterTrack: (2000). An IP Overlay Network for Tracking DoS Floods.
ProceeJrngs o/^SENTZ SecMr//y Sy^oszMw, 21, 114.
Strayer, W. T., Lapsely, D., Walsh, R. and Livadas, C. (2008). Botnet Detection
Based on Network Behaviorin Series: Advances in Information Security,
Springer, 1-24.
Tavallaee, M., Bagheri, E., Lu, W. and Ghorbani, A.-A. (2009). A Detailed Analysis
of the KDD CUP 99 Dataset. ProceeJmgs / /Ae Seco^J TEEE Sywpos/Mw
o^ CowpM/%//o^%/ T^/e/Z/ge^ce /o r SecMr//y ^^ J De/e^ce pp//c^//o^s, 53-58.
Tcpdump/Libpcap. (2015). ECPD^MP & ETFPC^P [Online]. Available:
http://www.tcpdump.org/.
The Shmoo Group, the DefCon. (2015). C^p/Mre /Ae C^p/Mre /Ae E/%g D^/^
[Online]. Available: http://cctfshmoo.com/.
University of California, (1999). XDD CMp 7PPP D%/% [Online]. Available:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
Wagner, A. and Plattner, B. (2005). Entropy Based Worm and Anomaly Detection in
fast IP Networks. ProceeJmgs o / /Ae74/ATEEE TM/em%//oM%/ ^orAsAops o^
E^^^//^g TecA^o/og/es/ T/%s/rMc/Mre/or Co//^^or^//ve E^/erpr/se, 172-177.
Wang, H., Zhang, D. and Shin, K. G. (2002a). Detecting SYN Flooding Attacks.
ProceeJrngs o / /Ae27s/TEEE ^MMM%/ Jo/^/ Co^/ere^ce o^ C o^M /er %^J
CowwM^/c^//o^s Soc/e//es, 1530-1539.
Wang, H., Zhang, D. and Shin, K. G. (2002b). SYN-dog: Sniffing SYNFlooding
Sources. ProceeJ/^gso/ /Ae 22"JTEEE TM/em%//oM%/ Co^/ere^ce o^
D/s/r/^M/eJ Co^M //^g Sys/ew, 421-428.
161
Wang, Y. and Global, I. (2009). SY%Yz'yYz'c%f TecA^Mey /o r NeYworA SecMrzYy. MoJer^
SY Yz'yYz'c ffy- ^yeJ V YrMyz'o DeYecYo ^^ J ProYecYz'o , Hershey: IGI Global.
Wireshark.org. (2015). ^z'reyA%rA [Online]. Available: https://www.wireshark.org/.
Witten, I. H., Frank, E., Trigg, L. E., Hall, M. A., Holmes, G. and Cunningham, S. J.
(1999). Weka: Practical Machine Learning Tools and Techniques with Java
Implementations.ProceeJmgyo/ YAeV Yer ^Yz'o f ^orAyAop o^ Ewergz'^g
X^owfeJge E ^gm eerm g^J Co^^ecYz'o^^z'yY-^^yeJ V /orw^Yz'o SyyYews, 99,
192-196.
Yau, D. K., Lui, J., Liang, F. and Yam, Y. (2005). Defending Against Distributed
Denial of Service Attacks with Max-Min Fair Server-centric Router
Throttles. VEEE/^CM Tr^^y^cYz'o^y o^ NeYworAz^g, 13, 29-42.
Zhang, Z., Li, J., Manikopoulos, C., Jorgenson, J. and Ucles, J. (2001). HIDE: a
Hierarchical Network Intrusion Detection System using Statistical
Preprocessing and Neural Network Classification. ProceeJmgyo/ YAe VEEE
^orAyAop o^ V /orw^Yz'o ^yyMm^ce %^J SecMrzYy, 85-90.
Zhao, Q., Xu, J. and Kumar, A. (2006). Detection of Super Sources and Destinations
in High-speed Networks: Algorithms, Analysis and Evaluation. VEEE JoMr^^f
o^ SefecYeJ re%y m CowwM z'c%Yz'o y, 24, 1840-1852.