gujarat technological university (gtu) ahmedabad …
TRANSCRIPT
Novel Revelation Frameworks for Prediction of Botnet
Attacks in Internet of Things (IoT) �
�
A Thesis submitted To Gujarat Technological University
for the Award of
Doctor of Philosophy
in
Computer/IT Engineering
by
Priyang Prakashchandra Bhatt 159997107020
under the supervision of
Dr. Bhaskar Thakker
GUJARAT TECHNOLOGICAL UNIVERSITY (GTU)
AHMEDABAD
AUG 2021
���
�
© Priyang Prakashchandra Bhatt
����
�
�����
�
Ph.D. THESIS Non-Exclusive License to
GUJARAT TECHNOLOGICAL UNIVERSITY
In consideration of being a Ph.D. Research Scholar at GTU and in the interest of the
facilitation of research at GTU and elsewhere, I, Priyang Prakashchandra Bhatt having
Enrollment No. 159997107020, hereby grant a non-exclusive, royalty-free and perpetual
license to GTU on the following terms:
a) GTU is permitted to achieve, reproduce and distribute my thesis, in whole or in part,
and/or my abstract, in whole or in part (referred to collectively as the “Work”) anywhere
in the world, for non-commercial purposes, in all forms of media;
b) GTU is permitted to authorize, sub-lease, sub-contract or procure any of the acts
mentioned in paragraph (a);
c) GTU is authorized to submit the Work at any National /International Library, under the
authority of their “Thesis Non-Exclusive License”;
d) The Universal Copyright Notice (©) shall appear on all copies made under the authority
of this license;
e) I undertake to submit my thesis as my original work, does not infringe any rights of
others, including privacy rights, and that I have the right to make the grant conferred by
this non-exclusive license.
g) If third party copyrighted material was included in my thesis for which, under the terms
of the Copyright Act, written permission from the copyright owners is required, I have
obtained such permission from the copyright owners to do the acts mentioned in
paragraph (a) above for the full term of copyright protection.
h) I retain copyright ownership and moral rights in my thesis and may deal with the
copyright in my thesis, in any way consistent with rights granted by me to my University
in this non-exclusive license.
i) I further promise to inform any person to whom I may hereafter assign or license my
copyright in my thesis of the rights granted by me to my University in this non-exclusive
license.
j) I am aware of and agree to accept the conditions and regulations of Ph.D. including all
policy matters related to authorship and plagiarism.
���
�
Abstract
In the Internet of Things (IoT) environment, any object with sensor nodes and other electronic
devices can involve communication over wireless networks. Hence, this environment is highly
vulnerable to the Botnet attack. Botnet attack degrades the system performance in a manner
difficult to get identified by the IoT network users. The Botnet attack is incredibly challenging
to observe and take away in a restricted time. Challenges prevailed in the detection of Botnet
attacks due to several reasons: its unique structurally repetitive nature, performing non-
uniform and different activities, and invisible nature followed by deleting the record of
history. Even though existing mechanisms have taken action against the Botnet attack
proactively, they have been less efficient in capturing Botnet attackers’ frequent abnormal
activities. When the number of devices in the IoT environment increases, the existing
mechanisms are missing more Botnets due to their functional complexity. So this type of
attack is very complex and challenging to identify.
To detect Botnet attacks, the first approach proposes a heterogeneous ensemble stacking
PROSIMA classifier. This approach takes advantage of cluster sampling in place of the
conventional random sampling method for higher prediction accuracy. We tested the proposed
classifier on an experimental test setup with 20 real IoT nodes. The proposed approach enables
mass removal of Botnet attack detection with higher accuracy at a reduced time that helps the
IoT environment to maintain the entire network's reliability.
To detect Botnet with high accuracy in less time than the first approach, the Second approach
proposes a Bootstrap Aggregating Surflex-PSIM Classifier. It gathers data from several sensor
nodes, preprocesses using Linear Random Euler Complex-valued Filter (LRECF).
Accordingly, the linearized data is subjected to the training phase comprising Random Poisson
Forest (RPF) to predict accurately the Botnet creating Distributed Denial of Service (DDoS)
and Spam attacks within less time. A similar Botnet is clustered using surflex-PSIM that
isolates the Botnet attacked clusters based on automatically trained characteristics pocket
value after being trained. Thus, with our proposed classifier's aid, Botnet is detected and
separated with high accuracy at reduced time, thereby ensuring system reliability with
enhanced system performance.
����
�
With the internet, billions of IoT devices are interconnected with each other and
communicating through messaging bots. The attackers sometimes control the messaging bots
to carry out several malicious activities. Thus, bots become a severe cybersecurity hazard for
IoT devices. For this reason, it is crucial to detect the existence of malicious bots and other
anomalies in the network. Our third approach proposes a novel forecastive anomaly-based
Botnet revelation framework for competing concerns in the Internet of Things (IoT) to tackle
these bots and anomalies. The technique works as a two-way progression, the first is the
instance creation, and the second is cataloging. As an alternative to machine learning
algorithms, ensemble-based stream mining is being used to generate several instances with
less memory and time in our work. Once when the instances are created, Graph Structure-
Based Detection of Anomaly (GSBDA) is initiated based on features derived by the stream
mining algorithm to detect hazardous anomalies. The second phase also utilizes a KNN (K
Nearest Neighbor) algorithm, a type of instance-based learning algorithm. It is used to identify
the Botnet accurately by observing the network flows.
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�����
�
�
�
�
�
�
�
�
�
�
Dedicated
To
My Family
�
�
�
�
�
�
�
�
�
�
�
�
����
�
ACKNOWLEDGEMENT
I deem it a great pleasure to record my gratitude to my research supervisor Dr. Bhaskar
Thakker, Ex-Professor, Symbiosis Institute of Technology (SIT), Pune. He has been a constant
source of motivation for me, and his inspiring guidance has played a crucial role in shaping
this thesis. I am privileged to have worked under his supervision.
Completing this work would not have been possible without the Doctoral Progress Committee
(DPC) members: Dr. Nikhil Kothari, Head & Professor, DDU, Nadiad, and Dr. Apurva Shah,
Associate professor, MSU, Baroda. I am thankful for their rigorous examinations and precious
suggestions during my research.
I wish to express my sincere gratitude to Late Shri. DR C L Patel sir, chairman Charutar Vidya
Mandal (CVM), permits me to study with job work.
I am grateful to Dr. Himanshu Soni (Principal), Dr. Maulika S. Patel (Head, CP Dept.), and
my colleagues (teaching, non-teaching) at G H Patel College of Engineering & Technology for
their help, continuous motivation, and support during my research work.
A big thank to my mother for their unconditional love and for extending her support since
childhood, and my father has always remained as my best friend and as my role model. I am
fortunate to have you in my life. I am grateful to my wife for encouraging me to take up this
endeavor. It would not have been possible for me to accomplish this without their support and
constant motivation. I cannot express in words the endless affection given by kinds, which
were necessary to fuel my journey. I also thank my sister, brother-in-law, niece, and nephews
for their constant love, motivation, and support.
I am thankful to one and all who were involved directly or indirectly in this long journey of
mine. In the end, I thank Almighty God for giving me direction and enthusiasm to move
through the tough times.
Priyang P. Bhatt
xv
TABLE OF CONTENTS
S.N. Content Pg.No.
i Title Page…………………………………………………………………... i
ii Copyright…………………………………………………………………. ii
iii Declaration…………………………………………………………………. iii
iv Certificate………………………………………………………………….. iv
v Course-work Completion Certificate……………………………………… v
vi Originality Report Certificate……………………………………………… vii
viii Non-Exclusive License Certificate ………………………………………... viii
viii Thesis Approval Certificate……………………………………………….. x
ix Abstract……………………………………………………………………. xi
x Acknowledgment………………………………………………………….. xiv
xi Table of Contents………………………………………………………….. xv
xii List of Abbreviations………………………………………………………. xviii
xiii List of Figures……………………………………………………………… xxi
xiv List of Tables………………………………………………………………. xxii
1 Chapter 1 Introduction 1
1.1 IoT Application………………………………………………………… 2
1.1.1 Smart Home………………………………………………………. 3
1.1.2 Smart Cities………………………………………………………. 3
1.1.3 Smart Environment……………………………………………….. 3
1.1.4 Agriculture………………………………………………………... 4
1.1.5 Industry…………………………………………………………… 4
1.1.6 Health and Lifestyle………………………………………………. 4
1.2 Security Threats: IoT Applications…………………………………….. 5
1.2.1 Security Issue: Sensing Layer…………………………………….. 5
1.2.2 Security Issue: Network Layer…………………………………… 6
1.2.3 Security Issue: Middleware Layer………………………………... 7
1.2.4 Security Issue: Application Layer………………………………... 7
1.3 Botnet Attack………………………………………………………….. 7
1.3.1 Botnet Detection Techniques………………………………………… 9
xvi
2 Chapter 2 Literature Review 14
3 Chapter 3 Mass Removal of Botnet Attacks Using Heterogeneous
Ensemble Stacking PROSIMA Classifier in IoT: First approach 25
3.1 Heterogeneous Ensemble stacking PROSIMA classifier …………… 26
3.2 Data Collection…………………………………………………… 26
3.3 Pre-processing………………………………………………………… 27
3.4 Feature Selection……………………………………………………… 27
3.5 Proposed Model……………………………………………………… 28
3.5.1 Popular ways to combine different classifiers……………………. 28
3.5.2 Popular ways to combine different classifiers …………………… 28
3.5.3 Overall Architecture of Heterogeneous Ensemble Stacking Meta-
classifier…………………………………………………………………….
29
3.5.3.1 XGBoost (Extreme Gradient Boosting) Algorithm…………. 29
3.5.3.2 Adaboost (Adaptive Boosting) Algorithm………………….. 29
3.5.3.3 Random cluster sampling forest Algorithm…………………. 30
3.6 Mass clustering based on PROSIMA protein similarity……………… 32
3.7 Experimental setup……………………………………………………. 34
3.8 Results and Discussion………………………………………………… 35
3.8.1 Calculation of packet arrival time…………………………………. 36
3.8.2 Packet Delivery Ratio (PDR) ……………………………………… 38
3.8.3 Packet Loss…………………………………………………………. 38
3.8.4 Throughput………………………………………………………….. 39
3.8.5 Clustering the Botnet of DDoS and SPAM types………………….. 39
3.8.6 Comparing proposed classifier with existing classifiers…………….. 42
4 Chapter 4 Isolating Botnet Attack Using Bootstrap Aggregation
Surflex-PSIM Classifier in IoT: Second approach
47
4.1 Seclusion of Botnet attacks using PSIM based on random passion
forest model………………………………………………………………
47
4.2 Data gathering phase……………………………………………... 47
4.3 Removing complex-valued variables……………………………… 48
4.4 Bootstrap Aggregating Surflex-PSIM Classifier……………………… 51
xvii
4.4.1 Random training model based on passion distribution………….. 51
4.5 Pseudocode for random passion forest……………………………… 53
4.5.1 Mass clustering based on P-SIM clustering……………………… 55
4.5.2 Pseudocode for P-SIM clustering……………………………… 56
4.6 Result and Discussion………………………………………………… 59
4.6.1. Implementation……………………………………………………. 59
4.6.2 Packet Delivery Ratio (PDR) ………………………………………. 60
4.6.3 Packet Loss………………………………………………………….. 60
4.6.4 Throughput………………………………………………………….. 61
4.6.5 Clustering of Botnet of DDoS Attack……………………………… 62
4.6.6 Clustering of Botnet of SPAM attack……………………………….. 63
4.6.7 Comparison of proposed system with existing techniques………….. 64
5 Chapter 5 A Novel Forecastive Anomaly Based Botnet Revelation
Framework for Competing Concerns in Internet of Things(IoT):
Third approach
70
5.1 Forecastive Anomaly-based Botnet Revelation framework…………… 70
5.2 Ensemble-based stream mining…………………………………... 71
5.3 Ensemble Classification approach…………………………………… 73
5.4 Cataloging……………………………………………………………… 75
5.5 Result & Discussion…………………………………………………… 76
5.5.1 Performance Analysis……………………………………………. 79
5.5.2 Performance Comparison……………………………………… 82
6 Chapter 6 Conclusion, Future Scope, and References 85
6.1 Conclusion…………………………………………………………… 85
6.2 Future Scope…………………………………………………………… 87
6.3 References…………………………………………………………… 89
List of Publications…………………………………………………….. 99
Appendix………………………………………………………………….. 100
xviii
List of Abbreviations
AdaBoost Adaptive Boosting
AGD Algorithmically Generated Domains
ARM Advanced RISC Machine
BLSTM-
RNN)
Bidirectional Long Short Term Memory based Recurrent Neural Network
C&C Command and Control
CDN Content Delivery Networks
CNN Convolution Neural Network
CPU Center Processing Units
CSV Comma Separated Values
D2D Device-to-Device
DBC Distributed Block Chain
DDoS Distributed Denial of Service
DHCP Dynamic Host Configuration Protocol
DNS Domain Name System
ELF Firmware Linkable Format
FN False Negative
FP False Positive
FPR False Positive Rate
FPR False Positive Rate
GA Genetic Algorithm
GRE Generic Routing Encapsulation
GSBDA Graph Structure-Based Detection of Anomaly
HTTP Hyper Text Transfer Protocol
ICMP Internet Control Message Protocol
IGMP Internet Group Management Protocol
IoT Internet of Things
IoTDS Internet of Things Detection System
IRC Internet Relay Chat
xix
ISP Internet Service Provider
IT Information Technology
KNN K- Nearest Neighbor
LAN Local Area Network
LDA Linear Discriminant Analysis
LOF Local Outlier Factor
LRECF Linear Random Euler Complex-Valued Filter
LSTM Long Short Term Memory
MIPS Million Instruction Per Second
ML Machine Learning
MSE Most Significant Bit
NIDS Network Intrusion Detection Systems
NTP Network Time Protocol
OS Operating System
P2P Peer-to-Peer
PC Personal Computer
PDR Packet Delivery Ratio
PHP Personal Home Page
PPC Performance Computing
PROSIMA Protein Similarity
PSIM Protein Similarity
RBF Radial Basis Function
ROC Receiver Operating Characteristics
SMO Social Media Optimization
SMS Short Message Services
SNMP Simple Network Management Protocol
SOM Self Organizing Map
SSH Secure Shell
SSL Secure Socket Layer
SVM Support Vector Machine
TCP Transmission Control Protocol
xx
TCP/IP Transmission Control Protocol/Internet protocol
TN True Negative
TP True Positive
TPR True Positive Rate
UDP User Datagram Protocol
VANET Vehicular Ad-Hoc Network
WSN Wireless Sensor Network
XGBoost Extreme Gradient Boosting
xxi
List of Figures
Figures Description Page
No.
FIGURE 1.1 IoT Layers 5
FIGURE 1.2 Different types of attacks on IoT System 6
FIGURE 1.3 Botnet Structure 9
FIGURE 1.4 Typical Botnet Attack 10
FIGURE 3.1 Heterogeneous Ensemble stacking PROSIMA Classifier 25
FIGURE 3.2 Overall Structure of the IoT network with Botnet Attack 26
FIGURE 3.3 Training process with random clustering forest Algorithm 31
FIGURE 3.4 Process Flow of Heterogeneous Ensemble stacking meta-
classifier
32
FIGURE 3.5 Process flow of the proposed system 33
FIGURE 3.6 Flow Diagram for Clustering of Botnet using PROSIMA 33
FIGURE 3.7 The IoT Experimental setup for detecting Botnet Attacks 35
FIGURE 3.8 The IoT based network 36
FIGURE 3.9 PDR During normal and attack time 38
FIGURE 3.10 The Network packet loss over normal and attack duration 38
FIGURE 3.11 The Network throughput over normal and attack duration 39
FIGURE 3.12 Clustering of Botnet attack which leads to DDoS and SPAM
attacks
41
FIGURE 3.13 Comparison graph for Precision 43
FIGURE 3.14 Comparison graph for Recall 44
FIGURE 3.15 Comparison graph for F-Measure 45
FIGURE 3.16 Comparison graph for Accuracy 46
FIGURE 4.1 Overall Architecture of the IoT network with Botnet Attack 48
FIGURE 4.2 Complex valued linear filtering 49
FIGURE 4.3 Process flow of the proposed system 50
FIGURE 4.4 Random Classifier 51
FIGURE 4.5 Training process with random Poisson forest 54
FIGURE 4.6 Flow diagram for clustering of Botnet 58
xxii
FIGURE 4.7 The IoT based Network 59
FIGURE 4.8 Packet Delivery Ratio during normal and attack period 60
FIGURE 4.9 Packet loss during normal and attack period 61
FIGURE 4.10 Throughput of the network under normal and attack period 61
FIGURE 4.11 Clustering of Botnet attack which leads to DDoS attack 64
FIGURE 4.12 Clustering of Botnet attack which leads to Spam attack 65
FIGURE 4.13 Comparison graph for Precision 66
FIGURE 4.14 Comparison graph for Recall 67
FIGURE 4.15 Comparison graph for F-measure 68
FIGURE 4.16 Comparison graph for Accuracy 69
FIGURE 5.1 Proposed Framework 70
FIGURE 5.2 Ensemble-based stream mining-concept drift in the unbounded
data stream
71
FIGURE 5.3 Ensemble-based classifier design 73
FIGURE 5.4 a) Typical fair GSBDA example, b) Best substructure and c)
Anomalous Substructure
75
FIGURE 5.5 (a) Node Initialization (b) Node Creation 79
FIGURE 5.6 Nodes that are detected as Botnet 79
FIGURE 5.7 Performance metric of the proposed framework in terms of
arrival time, packet delivery ratio, and throughput
81
FIGURE 5.8 Comparison of proposed with Supervised and unsupervised
learning methods.
84
xxiii
List of Tables
Table No. Description Page No.
TABLE 1.1 Comparisons of IT network and IoT Network 2
TABLE 3.1 Log details of each node in the network 40
TABLE 3.2 Lists of nodes clustered under DDoS and SPAM Botnet Attack 41
TABLE 3.3 (a) List of classifiers with the proposed system 42
TABLE 3.3 (b) List of classifiers with the proposed system 42
TABLE 4.1 Log details of each smart object in the network 62
TABLE 4.2 List of nodes clustered under DDoS Botnet attack 63
TABLE 4.3 List of nodes clustered under Botnet spam attack 64
TABLE 4.4 List of classifiers with proposed system 65
TABLE 5.1 CTU-13 Scenarios 77
TABLE 5.2 CTU-13 Dataset Distributions 78
TABLE 5.3 Performance metric of the proposed framework 80
TABLE 5.4 Proposed performance in terms of throughput, packet loss, and
packet delivery ratio, and arrival time
80
TABLE 5.5 Total No. of nodes to search for Botnet and other anomalies 81
TABLE 5.6 Comparison with the prior methodology for Scenario 1 82
TABLE 5.7 Comparison with the prior methodology for Scenario 2 82
TABLE 5.8 Comparison with the prior methodology for Scenario 6 83
TABLE 5.9 Comparison with the prior methodology for Scenario 8 83
TABLE 5.10 Comparison with the prior methodology for Scenario 9 84
1
CHAPTER 1
Introduction
New technologies are developing day by day, followed by continuous development in the cyber
world. IoT has supported significant changes in our daily lives in many aspects, such as health
care and traffic monitoring services. Additionally, it helps machine information to the machine
by connecting multiple devices over the Internet. In contrast, there is an increase in internal
vulnerabilities that cybercriminals often leverage. However, the number of active users in IoT is
increasing day by day [1].
IoT can transmit data over a network without specially identified devices, mechanical and digital
devices, objects, unique identifiers (UIDs), and man-to-man, Human-to-computer interaction.
IoT connects various technologies like Cloud, Artificial Intelligent, Sensors, and Actuators. The
Internet of things derived from two terms first is 'Internet’, and the second one is 'Things.' The
Internet would be the connection medium through which we will realize global connectivity.
Things can be anything like chairs, fans, and Television. Traditionally these things are not
designed to communicate with the Internet [2].
The IoT application has been increasing day by day. The digital economy will also grow with
these applications and concepts. Security and privacy issues are generated due to the creation of
a large number of applications without any security concern. There are different securities and
privacy issues in the IoT network compared to the traditional IT networks.
It is apparent from the below comparison that the IoT network is vulnerable to many security
threats due to its constrained resources. Many challenges are there in the IoT network compared
to existing traditional IT networks [3]. Because of these reasons, many security attacks have been
targeting the IoT network. Mirai is one such kind of attack generated against 2.5 million IoT
devices connected with the Internet in October 2016, which launched Distributed Denial of
Service (DDoS attack). According to Wikipedia, other such kinds of attacks against IoT
networks are Hajime and Reaper [81,116].
2
TABLE 1.1 Comparisons of IT network and IoT network
Traditional IT network IoT Network
IT Network is rich in terms of resources
(Software and Hardware).
The IoT network is not rich in terms of
resources compared to the IT Network.
No Resource Constraint in terms of power and
memory.
Resource Constraint in terms of power and
memory.
Complex algorithms can be executed. Required small algorithm.
Uses homogenous technology terms of
protocol etc.
Uses heterogeneous protocol and
technologies.
Because of resource constraints, IoT networks easily allow hackers to enter the system's gateway
and access network data. Other than things, IoT sensors are injected into the human body to
measure the condition of different human body organs. It would become dangerous if such kinds
of sensors are compromised [4].
If we go with the stack, then the IoT network can be divided into four layers. The first layer
contains physical entities like sensors and actuators to the cognition of data. The collected data
would be transmitted through the second layer of the IoT stack. The next layer is called the
middleware layer, which acts as an interface between the network and application layer. The
final application layer contains different types of applications, like smart homes, smart transport,
etc.
This chapter discusses various possible security threats in IoT applications for these four layers.
The issues associated with the gateways that connect these layers are also discussed in this
chapter.
1.1 IoT Applications
IoT has many applications; some are in the home, cities, industry, environment, and agriculture
domain.
1.1.1 Smart Home
Smart homes have many small IoT components like smart lighting, smart appliances, intrusion
detection, and smoke/gas detector. Smart lighting's primary purpose is to automatically save
3
energy and switch on/off as per the current condition. A light that is wired or wireless-enabled
would be controlled remotely using mobile or web applications. Other such applications are
smart appliances in modern homes like TV, refrigerators, washing machines, etc. To operate
these appliances need separate remote controllers. IoT makes it easy to operate using a single
mobile or web interface and also sends current status information and notification of such
devices to the user in real-time. Also, fetch the updates of its software automatically from the
Internet. Another application is intrusion detection in smart homes, which generates alerts to the
user informing SMS or E-mail with photos or video clips attached. Similarly, smart smoke/ gas
detectors generate signals to the user for some conditions like fire alert or harmful gas to the user
using SMS or E-mail [5].
1.1.2 Smart Cities
The most widely used IoT applications in smart cities are smart parking, smart lighting,
intelligent roads, structural health monitoring, and surveillance. While randomly finding the
empty slot for parking, drivers contribute to additional congestion. IoT smart packing can help
the drivers to find parking slots easily and quickly using mobile or web applications that the
drivers can access. Sensors are placed in the parking zone that sends the message to the Internet
via a controller. Smart lighting helps in power saving and changes. It automatically states light
intensity and is connected with other lights for information sharing regarding current light
conditions. Intelligent roads are associated with different sensors that send alerts to the driver
regarding current driving conditions. Structural health monitoring systems will predict the heath
of buildings/bridges and prevent accidents and sudden breakdown conditions. The surveillance
system detects and monitors different events in the city for safety and security purposes [6].
1.1.3 Smart Environment
A smart environment having many subsections like weather monitoring, air pollution
monitoring, noise pollution monitoring, Forest fire detection, and river flood detection.
Intelligent weather monitoring systems fetch data from connected sensors (temperature,
humidity, pressure sensors, etc.) and send it to the cloud for monitoring and analysis to make
better decisions and be conveyed to subscribers. IoT-based air pollution systems monitor harmful
gasses generated by factories and industries to decide on air pollution control systems. Due to the
large population in cities noise pollution rate is increasing day by day proposal to the air
4
pollution rate. Noise pollution control is as essential as air pollution that increases stress and
sleeps distribution. The IoT-based noise pollution control system monitors the noise level of the
different portions of the city. It updates it to the cloud that will use to identify noise pollution
sources in the city. Forest fire causes significant damage to the human as well as to human life.
IoT- based fire forest detection predicts fire detection and forwards the trigger information to the
concern handling system. River floods cause the same damage as a forest fire. Early monitoring
can minimize the damage by placing different sensors that monitor the water level and send data
to the server for further processing and decision-making [7].
1.1.4 Agriculture
Smart agriculture helps in crop monitoring also for saving water during irrigation. In intelligent
irrigation systems, monitor the moisture using sensors and generate water flow accordingly as
per the given threshold. Also, we can create a schedule for watering as per collected moisture for
different plants and trees. We can almost use a greenhouse monitoring system that can monitor
as per climatic conditions for the plant's growth [8].
1.1.5 Industry
IoT can help in machine diagnosis, and prognosis means monitoring the current operating
condition and comparing the same with the normal state for measuring the machine's
performance. There are a large number of components available in the device. Placing different
sensors inside the machine can monitor the current working condition periodically and generate
the trigger for its betterment. IoT can also help in indoor air monitoring systems that can cause
health issues. Placing different air monitoring sensors inside the industry can monitor and help
reduce air pollution [9].
1.1.6 Health and Lifestyle
Now a day's many wearable health monitoring system devices are available to measure fitness
monitoring. These wearable devices form a WSN (Wireless Sensor Network) called a body area
network [10].
5
Figure 1.1 IoT Layers [1]
1.2 Security Threats: IoT Applications
Different layers (sensing Layer, network layer, middleware layer, and application layer) of IoT
can cause many security threats.
1.2.1 Security Issue: Sensing layer
The sensing layer represents sensors and actuators, the physical part of the IoT network. There
are different possibilities of security threats like node capturing using these techniques, and
attackers can replace normal nodes with malicious nodes by capturing the data flow between IoT
devices. Due to the automatic update at the gateway attacker, a malicious code injection attack
6
can inject malicious code inside it and violate the gateway node's functionality. False data
injection attack, if the attacker can capture or inject the malicious code inside the gateway,
injects faulty data inside the node that will generate unpredictable results and disturb the
decision-making system. Other attacks that can be possible on the sensing layer are sleep
deprivation attacks and booting attacks in the first attacker can drain the battery power and create
a denial of service attack. In the second type of attack, attackers try to disturb the booting process
by that they can generate vulnerability while restarting the IoT devices [11-12].
Figure 1.2 Different types of attacks on IoT System [1]
1.2.2 Security Issue: Network Layer
A typical attack on the network layer of the IoT stack is a phishing attack in which an attacker
tries to compromise the username and password of the webpage of some IoT application by that
attacker can perform other malicious activities. Access attack, in this kind of attack, the attacker
stays more time in the network to steal the information of the IoT network. This kind of attack is
challenging to handle. The most common and dangerous attack is the DDoS/DoS attack in that
the attacker generates a large number of unwanted requests that deny authenticated devices
access to the network. In this kind of attack, routing attacks route the packet through malicious
nodes and harm the entire system [13].
7
1.2.3 Security Issue: Middleware Layer
The middleware layer is the interface between the application and the network layer. The
middleware layer is where broker, data, and machine learning algorithms lie. This kind of attack
possible on the middleware is a men-in-middle-attack in which the attacker tries to compromise
the broker so that they can take complete control of the IoT network. In this type, SQL Injection
attack attacks, in this attack malicious queries inside the SQL database for obtaining data of the
user. Flooding attacks on the cloud are the same as the DDoS attack, but may queries flood the
cloud to increase the cloud server's load [14].
1.2.4 Security Issue: Application Layer
The application layer is directly serving the users. Here may arise different issues as per the
different kinds of applications. Security issues at the application layer are data thefts. IoT users
generate millions and billions of data during communication. Also, users of IoT networks
register their private data to the application, which causes the attacker to steal confidential
information using encryption and authentication can prevent this type of attack. Access control
attack, access control allows the user to access the data. If access control is compromised, then
the entire IoT application is compromised. Malicious code injection attacks by that attacker try to
inject malicious code in the current script like cross-site scripting attack and hijack the IoT as a
full account of the user. Sniffing attack in which the attacker sniffs the packet and tries to read
user data if there is no proper protocol to prevent this. Reprogram attack, in which an attacker
tries to reprogram the IoT node/device remotely to gain control [15].
1.3 Botnet Attack
As per the above discussion, many attacks on the IoT network, one such latest attack on the IoT
network is the Botnet attack. With the ongoing quick improvement of the Internet of Things
(IoT), there has been expanding enthusiasm for understanding rising digital dangers in IoT [16].
Nodes of IoT are limited in resources where dedicated and diversified communication protocols
are used. Some of these differences weaken the ability of IoT nodes to protect themselves [17].
Since in an IoT environment, any object equipped with a sensor node and other microelectronic
devices can involve in communication over a wireless network, and this environment is highly
vulnerable to the Botnet attack.
8
A Botnet is an extensive collection of compromised nodes, which is controlled remotely by the
bot-master. A group of smart objects come together in a Botnet attack and carry out operations
leading to the destruction of the IoT-based system. Bot originates from the word robot that
naturally works as a PC program or content composed by the bot-master [18]. The figure shows
the working of the Botnet. A Botnet can generate a substantial volume of attacks of many types
like DDoS, Phishing, etc. Bot-Master controls a Botnet. The bot-master tries to infect as many
devices that significantly impact the attack to create a Botnet. Bot-master handles this entire
network using a C & C server. Compromised nodes will follow the server's command and attack
the target. The Botnet is not to infect just one device but to millions of network nodes. The
Botnet attack's main idea is to compromise the node or IoT devices for their purpose. Here are
the different steps that Botnets are using to infect the target nodes.
1. Bots interact through legitimate channels of communication.
2. They can use different communication techniques like IRC, Telnet, etc.
3. Infected nodes are called Bot, which communicates with the server, and C & C server can now
control the infected Node.
4. Now, the C &C server can communicate and give instructions to the infected host to serve its
task.
The process of the Botnet attack has been shown with the aid of Figure 1.3.
As shown in the figure below, bot-master initializes communication using different
communication channels like IRC, telnet, etc. Bot-master performs registration of the
compromised Node. Bot-master can inject malicious code inside the compromised Node (called
Bot). By repeating this step, Bot-master creates an extensive Bot (Compromised Node) network
using C & C Server. Bot-master can send the command to the compromised host (Bot) to
perform the task. Bot-master can keep this connection for a long time.
9
FIGURE 1.3 Botnet Structure
[Source: https://blog.emsisoft.com/en/27233/what-is-a-botnet/]
The researcher uses different methods to detect and prevent Botnet attacks. The following
section discussed such methods.
1.3.1 Botnet Detection Techniques
Day-by-day new technologies are developed, followed by a continual development in the cyber
world. At the same time, IoT has endorsed significant daily life changes in numerous aspects,
such as health care and traffic monitoring services. Moreover, it aids the machine to machine
communication by connecting multiple devices over the Internet. Conversely, there is a rise in
intrinsic vulnerabilities that are often leveraged by cybercriminals. Yet, the number of active
users in IoT gets increased day by day.
With the continued rapid advancement of the Internet of Things (IoT), there has been increasing
enthusiasm for understanding rising digital dangers in the IoT domain. IoT devices are
10
amazingly defenseless and alluring to aggressors for their exceptionally heterogeneous parts,
innocent security arrangements, and powerless encryption check [19].
FIGURE 1.4 Typical Botnet attack
Bot originates from the word robot that naturally works like a computer program or content
composed by the bot-master. This Botnet continues to be a significant source of large-scale
attacks on the Internet with recent increases in attack traffic [20].
Botnet location and removal are essential to resolve the indicated issues and are done by the
interruption location framework and honeynet, which suffers from of Botnet detection [21]. A
recent trend is network-based Botnet detection, which uses Machine Learning Algorithms
(MLAs) to identify malicious traffic [22]. These machine learning-based strategies make
discernable patterns inside the system activity [23]. This class of detection approaches
guarantees mechanized recognition that can sum up learning about harmful system activity from
the accessible perceptions, subsequently dodging traps of mark-based discovery approaches that
are just ready to identify known movement oddities [24].
Different identification techniques possess different strategies that utilize assorted movement
examination standards, focusing on different Botnet arrangement attaributes [25]. The primary
11
presumption of the machine learning-based methods is that Botnet makes discernable examples
inside the system activity [26]. The discovery because of MLAs system activity investigation
guarantees an adaptable identification that does not expect the movement to display any bizarre
attributes [27]. The class of discovery strategies does not require earlier information on Botnet
movement designs but instead deduces the learning exclusively from accessible perceptions.
Different recognition strategies like Random forest, Naive Bayes, SMO, and MLP machine
learning algorithms are used to classify the data, which fails to identify the type of Botnet attack
[28].
Because of IoT nodes’ resource constraints, heterogeneous protocols are used, and it isn't easy to
protect themselves [29]. Data from nodes would send to the cloud in the IoT System, which
processes the data and then sends it to users [30]. Botnet allows the attacker to access the device
connected with IoT and get access to the connection. This kind of attack raises security concerns,
and a third party achieves control of the IoT device for malicious activities. So such a system is
becoming a desirable target for the attackers [31].
Recently the most potent attacks were performed by Botnet, which consisted mainly of insecure
IoT devices. The Botnet Mirai is considered the most massive Botnet in history, containing many
compromised IoT devices [32]. C&C servers referred to as command and control servers are
evolved for providing Botnet management platforms. C & C servers are specialized computers
controlled by attackers to send commands, spread malicious codes, files, and steal information
from the victim network [33]. The C & C servers hosting the Botnet herder's victims are
designed to quickly deploy a wide array of network and application attacks, provide
implementation scripts to Botnet victims, and quickly scale the attacks. The servers are capable
of Peer to Peer (P2P) communication and collaboration. The Botnet can control by single or
multiple Botnet herders [34].
The fundamental suspicion of strategies based on machine learning is that Botnet makes
discernable patterns inside the system activity. These patterns could be productively identified,
utilizing machine learning algorithms [35]. This class of detection approaches guarantees
mechanized recognition that can sum up learning about harmful system activity from the
accessible perceptions, subsequently dodging traps of mark-based discovery approaches that are
just ready to identify known movement oddities [36]. For the Botnet attack detection, machine
12
learning algorithms like random forest, naive bayes, SMO, and MLP are used for classification
purposes [37].
Support Vector Machine (SVM) looks for the most significant factual edge in the interim that
keeps each other from a similar class and far away from the diverse classes in the edge sense
[38]. Fuzzy means clustering algorithm likewise is also utilized for characterizing information.
SVM, random Forest, and naïve bayes with normal word vectors, an LDA-based classifier has
better execution. The downstream of machine learning examination is an expansion for the
learning approach yet considered [39]. A more significant number of IoT devices connected with
the Internet create security and make the situation vulnerable to the Botnet attack [40]. The
approach is well suited for detecting compromised IoT devices because these connected
appliances are typically task-oriented. Accordingly, they execute fewer, potentially less, complex
network protocols and exhibit traffic with minor variance than computers. However, the
prediction accuracy is very low [41].
One of the significant security concerns in IoT is Botnet, a pervasive and hazardous thread.
Several thousand to millions of compromised computers (bots) in a network are used by
malicious attackers to perform various illicit and vulnerable activities [42]. All such bots are
linked to a central communication system for receiving the attacker's commands to execute
malicious actions on a besieged system [43]. The main communication system offers a large
distributed platform to perform various malicious activities, including distributed denial-of-
service (DDoS), spamming, phishing, spying, etc. It creates severe threats/risks to several
industries, government organizations, academic circles, etc. [44, 45].
The adversary can employ the affected network to achieve malicious activities, including
Phishing attacks, data-stealing, Distributed Denial of Services (DDoS). Two direct detection
approaches are deployed to deal with Botnet vulnerabilities, namely, host-based and network-
based [46,47,49]. The host-based method exhibits low reliability in Botnet detection owing to its
constrained computation and power. A hierarchical classification of the network-based Botnet
detection approach in the IoT domain is proposed by [49]. Later, honeypots are employed to
detect the Botnet by analyzing, understanding, and characterizing bots' behavior followed by
tracking. Moreover, to see the existence of bots, honeypots require signature extraction, data
inspection, etc.
13
As per [50], usual networks constitute an optional detection source. In contrast, the Network
Intrusion Detection Systems (NIDS) monitor traffic data continuously and without human
intervention when using pattern matching to detect signs of undesirable activities. Such patterns
may rely on signatures identified by honeypot, DNS traffic with a potential C & C server, traffic
anomalies data mining, and hybrid approaches. The anomaly-based method is discussed in [101]
for detecting compromised IoT devices because these connected appliances are typically task-
oriented. Accordingly, they perform less, low latent, intricate network protocols, and exhibit
traffic with minor variance than PCs. However, the prediction accuracy is very low.
IoT could be a distinct network with a sizable amount of applications wherever there's an
opportunity in the prevalence of traffic and privacy considerations. In contrast, a single
degradation of a system fails out the entire structure. Similarly, hackers intrude on the network
using Botnet and degrade the method.
There arises a falloff in high-dimensional data systems to accurately knock out the Botnet, which
leads to inaccuracy in detection. In addition to that, delay in the detection of Botnet causes
degradation of the system.
It becomes essential to observe the Botnet accurately and to frame out a structure to prevent
Botnet. In the IoT, working with high-dimensional data will cause a delay in Botnet detection.
Delay in the Botnet detection will slow down the performance of the entire network.
Chapter 2: Surveys the available literature on the Internet of Things based on IoT Botnet attacks
and DDOS.
Chapter 3: Presents an Analysis on Mass Removal of Botnet Attacks Using Heterogeneous
Ensemble Stacking PROSIMA Classifier in IoT
Chapter 4: Provides a Detailed Description of isolating Botnet Attacks Using Bootstrap
Aggregating Surflex-PSIM Classifier in IoT
Chapter 5: Explains About a Novel Forecastive Anomaly Based Botnet Revelation Framework
for Competing Concerns in the Internet of Things
Chapter 6: Provides the conclusion and future scope of this thesis.
14
CHAPTER 2
Literature Review
In this chapter, existing methods for Botnet attack detection are discussed that ensure the
coverage of all essential works proposed in this area. Such fifty contributions were identified
from reputed journals and conferences, particularly from the last three years (2017-2019).
Meidan, Yair et al. [51] proposed autoencoders for anomaly detection from network traffic.
Botnet attacks have been detected from compromised IoT devices with high accuracy and a false
error rate. Autoencoders built for each and every IoT device in the network were trained with
malicious network traffic. It might be more difficult to capture its normal behavior, and
therefore, future observations may be subject to more categorization errors.
McDermott, D. Christopher et al. [52] proposed a deep learning-based Bidirectional Long Short
Term Memory based Recurrent Neural Network (BLSTM-RNN) model to detect Botnet for IoT
devices. BLSTM-RNN was used to recognize the text and the attack vector was converted into a
tokenized integer format. That was how minor FPR (False Positive Rate) in Botnet attack
detection. By helping consumers become aware when their device is infected, we hope to raise
awareness of the inherent vulnerabilities and aid them in making better choices in the future with
regard to procurement, and operation of such devices.
Yair Meidan et al. [53] presented machine learning algorithms on network traffic data for the
accurate identification of IoT devices connected to a network. To train and assess the classifier, it
collected and labeled network traffic data from nine different IoT devices, computers and smart
phones. Consuming supervised learning, it introduced a multi-stage meta classifier in the first
stage, the classifier can distinguish between traffic generated by IoT and non-IoT devices. In the
second stage, each IoT device was linked to a specific IoT device class. In future research, we
plan to explore applications and adapt our technology to additional scenarios, including different
network protocols and various data capturing points, to better understand how our approach
scales and generalizes.
15
Homayoun, Sajad, et al. [54] proposed BoTShark Deep learning-based Botnet traffic shark using
Convolution Neural Network (CNN) and used softmax at the end to identify malicious traffic.
That was how attacks from compromised IoT devices were detected. Our study also showed that
autoencoders perform better than CNN since it generates minor false positives. Applying other
deep learning techniques such as Long Short Term Memory (LSTM) will be considered a future
study.
N. Duff, et al. [55] presented a framework for classifying unsolicited IoT devices in machine
learning (ML) enterprises. Namely, information on the IP header from darknet data is obtained
for review. They will then consider multiple supervised ML algorithms to identify these Layer 3
headers. Our results show that random forest and gradient boosting have high recall and
precision ratings, while naïve bayes has the worst quality.
Lakshya Mathur et al. [56] focused on the unresolved issue of providing robust malware
detection for secure home routers. This work contrasts the efficacy of three approaches to
behavioral malware detection on home endpoint routers by analyzing kernel-level system calls
on these routers. i) one-class support vector machine, ii) principal component analysis, and iii)
naive anomaly detector based on unseen n-grams. However, one drawback of the naive anomaly
detector is that it is not easy to choose an appropriate detection threshold.
Francisco Villegas Alejandre et al. [57] defined the feature selection process for efficient
detection of the Botnet attack in the network. The main aim of this paper was to support different
researchers to select various efficient features from the dataset to improve accuracy to detect
Botnet attacks. Machine learning to train classifiers was applied to the data collected to evaluate
the tests. Analysis of network flow information is used as a method of identification because it
does not rely on packet’s content, Thus, providing immunity to the latest encryption and
obfuscation used by attackers to hide their bots.
Anchit Bijalwan et al. [58] performed the ISCX dataset for training and testing purposes. We
extracted the training and testing dataset functions. After extracting the features of this dataset,
these features are divided into two categories, regular traffic and Botnet traffic, and named. After
16
using a modern data mining method, we used a classifier algorithm package. Experimental
results show that the quality of finding bot proofs utilizing a collection of classifiers is better than
a single classifier.
Nareli Cruz Cortes et al. [59] presented a novel method for selecting features for the detection of
Botnet at their C & C server. The major problem is that researchers have suggested features
based on their experience, but there is no mechanism for testing such features, as some of these
features could have a lower detection rate than others. Results are shown, resulting in a
significant reduction in features and a higher detection level than the related work reported.
In this paper, Sean Miller et al. [60] defined the brief overview of the various machine learning
methods and their use in Botnet detection. The main aim of this paper is to clearly define the role
of different ML methods in the detection of Botnet. They also discuss subsets’ different flow rate
features and the resulting effect on detection accuracy depending on the machine learning
approach used. Moreover, multi-perspective machine learning is absent.
Hammer Schmidt et al. [61] presented an automatic online method for detecting change points in
network traffic based on an IP flow record analysis. This approach is used to break the observed
behavior into minor consecutive actions that differ. Segmented traffic is used to learn a specific
contact profile that accurately characterizes the behaviors between the two observed change
points. Moreover, there is a need to introduce a time-decay function to overlook parts of the
prefix tree if changes to these sections no longer occur overnight and thus, prioritize recent habits
over past ones in an online-learning manner, as well as working towards automatically
determining the change-point.
Kirubavathi Venkatesh and Anitha Nadarajan [62] have detected the Spyeye and Zeus Botnet
with the aid of an adaptive learning rate multilayer feed-forward neural network. Here in this
work, various classifiers such as Decision tree, Random forest, and radial basis function are
discussed and are compared with the actively learn neural network.
Kamal deep Singh et al. [63] researchers expand on the success of open-source frameworks such
as Hadoop, Hive, and Mahout to provide a scalable implementation of a quasi-real-time intrusion
detection method. It built a random forest-based decision tree model to solve the problem of
Botnet detection in a peer-to-peer network. Though the technique served well to detect Botnet, it
17
failed to detect Botnet under low-frequency communication when a certain threshold exceeded.
Yair Meidan et al. [64] proposed a novel network-based anomaly discovery method that
abstracted behavior snapshots of the network and utilized deep autoencoders to notice anomalous
network traffic proceeding from compromised IoT devices. It relied on deep autoencoders for
every device, trained on statistical features pulled out from benign traffic data. When applied to
new (possibly infected) data of an IoT device, noticed anomalies might indicate that the device
was compromised.
McDermott et al. [65] used the novel presentation of in-depth practice to develop a detection
model based on the Bidirectional Short-term Memory-Based Recurrent Neural Network
(BLSTM-RNN). Word entry is used to convert text recognition and attack packets to a token
integer format. The advanced BLSTM-RNN detection model is related to LSTM-RNN to
identify the four attack vectors used by Miracle Botnet and estimated for accuracy and damage.
Yair Meadon et al. [66] presented a machine learning algorithm on network traffic data to
accurately identify IoT devices connected to the network. It collects and labels nine different IoT
devices and network traffic data from computers and smartphone to train and predict
classification. Utilizing supervised learning, it has trained multi-stage meta-categorization. In the
first step, the classifier can distinguish between traffic generated by IoT and non-IoT devices. In
the second step, each IoT device is associated with a specific IoT device class.
Sajad Homayoun and others. [67] Proposed an intensive learning-based Botnet traffic analyzer
called the Botnet traffic shark (BotShark). The BotShark network only uses transactions and
lacks deep packet testing methods. Thus, avoiding overcompensation limits such as not being
able to deal with encrypted payloads. This allowed the proposed system to detect interactions
between core features and introduce new features in a cascading manner in each layer of the
autoencoder or interactive neural networks. Additionally, they used the softmax classifier to
track malicious traffic effectively.
Farouk Shaikh. [68] presented a model for classifying unwanted IoT devices in organizations
through machine learning (ML). The IP header was extracted from the apparent darknet data for
data analysis. Consider several supervised ML algorithms for classifying these layer three
18
headers. Finally, these algorithms relate to their performance in detecting the occurrence of
malicious IoT devices on the Internet. The results showed that random forest and gradient uptake
had higher recall and accuracy scores, while naive bayes experienced worse performance.
Kamal deep Singh and others [69] used large-scale scalable architecture based on random forest
modeling and free software Hadoop, hive, and mahout to identify P2P Botnet channels in
networks. Big data technology is used here to deliver enormous amounts of flow data in an
acceptable time. However, the false-positive rate is not enough.
Yair Meidan et al. [70] deployed N-BaIoT, a network-based anomaly detection technique. It
makes use of an autoencoder to discover anomalous network traffic originated from
compromised IoT devices. Yet, the work failed to satisfy the security policies due to its minor
traffic predictability nature that causes difficulties in attack detection.
Moitrayee Chatterjee et al. [71] use evidence theory-based techniques for malicious Bot
detection with the help of a probabilistic reasoning tool called Dempster Shafer Theory (DST).
The vital characteristic of DST is that the detection system doesn’t require any prior information
about the malicious signatures and profiles. However, it exhibits low accuracy and a chance for a
higher error rate.
Bhansinger et al. [72] proposed a scalable model that could be used to locate Botnet in P2P
networks. The proposed system treats network traffic as a data stream, separating traffic into two
parallel streams. The identity focuses on a network failure, contact traffic, and traffic rate. Traffic
is analyzed in a short-term window, and infected hosts are notified immediately. It detects peer-
to-peer (P2P) Botnet in the network and detects bots using a failure-based algorithm.
Mohammed S. Gadelrab et al. [73] deployed BotCap, a Botnet detection technique that uses
machine learning concepts. They have used ML algorithms with a set of statistical features
extracted per trace to detect individual Botnet. The drawback met by this detection system is that
it couldn’t detect and tackle the new generations of Botnet.
19
Christian Hammer Schmidt et al. [74] proposed a method for collecting relevant data (small
amount) limited to real-time learning of complex models, a class of finite state machines. Such
devices are used as fingerprint interaction profiles, which identify or classify hosts and services,
provide less practice and process higher identification rates, faster than conventional models.
Methods that help identify Botnet with batch settings have caused memory problems over time.
Samuel Marchal et al. [75] deployed network data in a stream to overwhelm these issues. When
evaluating their approach, they achieved high host identification rates. However, their solution
does not identify malicious Botnet flows, requiring an increased number of flows to perform the
detection task.
Sidra Ijaz et al. [76] proposed a genetic algorithm-based solution to detect malware attacks. It
examined the overall performance of the detection system detecting the attacks exposed in the
KDD dataset and scenario 2 of the CTU 13 dataset. They exhibit satisfactory performance, yet
they lack the online capability, requiring several rounds of optimization and batch data for the
genetic algorithm.
Weikeng Chen et al. [77] introduced a standard Botnet-based profiling framework using three
unsupervised flow-based learning algorithms, including self-organizing maps, local outliers, and
K-NN outliers evaluated three unsupervised machine learning algorithms, Self-Organizing Map
(SOM), and local outlier factor (LOF), and k-NN outlier, to build a normal behavior profile to be
used for Botnet detection. Moreover, plan to have more robust and adaptive functions to
calculate the decision boundary based on the overall distribution of normal behaviors.
S. Garcia et al. [78] compared the results of three different Botnet detection methods by
implementing a new, real, labeled, and large Botnet dataset. This data set includes Botnet,
general traffic, and background traffic.Our two methods (Bacillus and Kamnep) and the
BoTunter results were compared using a methodology developed for Botnet detection methods
and a novel error metric. Much like a large and real dataset, it has shown us that it is easy to
detect the difficulty of working with methods and unfamiliar background data at any stage of
Botnet behavior, even if there are no significant numbers of separate Botnet.
Jing Wang et al. [79] make use of a two-phase approach for Botnet detection. This method
detects and collects the network anomalies, which was then followed by bots identification. The
20
detection phase quantifies and monitors flow-level data as histograms, which are then used to
construct graphs of highly interactive nodes.
Christos Tagarkakis et al. [80] introduced an IoT Botnet attack detection method that relies on
the sparsity representation model using the reconstruction error throttling rule to detect malicious
IoT network traffic from a reconstructed IoT device. Botnet attack detection is based on small,
harmless IoT network traffic information, and therefore, we have no prior knowledge of
malicious IoT traffic data. There is a need to analyze the proposed approach further using more
IoT Botnet attack datasets and establish a broader relationship with existing IoT Botnet attack
detection methods.
Stephen Herwig et al. [81] introduced a new Botnet, known as Hajime, targeting many of the
same devices as Mirai but differ considerably in design and operation. Hajime uses the public
peer-to-peer system as its command and control infrastructure and regularly introduces new
exploits. Thus, increasing its resilience. Unfortunately, none of these approaches successfully
stop Hajime's C&C without compromising Bit Torrent’s DHT quality.
RIoT demonstrated the first attack on the Internet of Things (IoT) devices used by Mevlot Turk
Garip and others [82] showed that vehicle Botnet was a threat to VANETs and other complex
systems and networks. So IoT devices can be threatened if the vehicle is not protected against
Botnet.
Mahesh Banerjee and others [83] analyze network traffic by identifying the network, detecting
the presence of a Botnet on the network using network flow and classification techniques, and
having a significant impact on traffic-related traffic filtering. Local honeynets are deployed for
implementation. For the latter, other types of data captured by honeypot, malicious binaries,
attack replays, etc., can be considered when studying and identifying Botnet.
Hui-Trung et al. [84] proposed a method that combines intensive learning and machine learning
to create a new feature-based PSI-root sub-graph for cross-architecture IoT Botnet malware
detection. This function is strong enough for various general machine learning classifiers to
achieve approximately 97% accuracy and an F-score of 98%. However, it combines a multi-class
approach and a more straightfoward approach to improving performance.
21
Joao Marcelo Ceron et al. [85] presented an approach to managing network traffic generated by
IoT malware in an analysis environment. The proposed solution may modify the network layer
traffic based on the actions performed by the malware. An analyst can quickly implement
separate setup configurations to determine malware characteristics and develop signatures in our
approach. However, there is a need to examine the actions of other IoT Botnets and provide the
signatures associated with them in the public repository.
Lihua Yin et al. [86] proposed a ConnSpoiler, a lightweight program that quickly detects the
flow of Algorithm-Generated Domains (AGDs) into IoT-based Botnet. Only low system
resources should be adequate for the conspirator and may work well on resource-restricted IoT
devices. Furthermore, the ConnSpoiler should only take the effective domain, and therefore, no
additional effort is required to label malicious samples for the training phase. They test the
consoler based on real-world DNS traffic from two different major ISP networks, proving that it
distinguishes mainly infected devices from unknown Botnet.
Nicholas Coroniotis et al. [87] introduced the new dataset, Bot-IoT, with real and simulated IoT
network traffic with various attacks. They also demonstrate virtual testing environments to
address the current dataset vulnerability of capturing complete network information, accurate
labeling, and recent and complex attack variation. This work provides the basis for initiating
Botnet detection on IoT-specific networks. However, a network forensic model needs to be
developed using in-depth practice and the Bot-IoT dataset to assess its reliability.
Muhammad Junaid Farooq et al. [88] suggested an empirical framework for analyzing the spread
of D2D malware on wireless networks. Using techniques from dynamic population processes
and point system theory, they capture the entry and integration of malware through the network
topology. Therefore, as part of future work, the proposed model should be used as a basis for
developing the game-theorem framework, which will allow us to develop appropriate approaches
for both the attacker and the defender.
Reem Alhajri et al. [89] focus on machine learning techniques to detect security threats on the
Internet. This aims to explore the feasibility of using auto-encoders to detect IoT Botnet. The
Botnet can develop DDoS attacks and present a significant security concern in IoT networks, as
no single method has shown the potential to address this security threat. However, it is not
22
aggregating the desirable features of the auto-encoder and mapping the safety requirements for
the Botnet detection system.
Georgios Spatholas et al. [90] proposed the use of mild agents installed in multiple Internet of
Things (IoT) installations (e.g., smart home) to detect service rejection attacks by Botnet devices.
Although still very open, our experiments suggest that it is possible to detect problems with
implementing of such systems or the underlying consensus policies, large-scale DDoS attacks.
Christopher d. et al. [91] analyzes the user needs of IoT devices and their importance on security
and privacy. They used experimental framework to determine users’ ability to identify threats in
the light of technology and experience. The limitations of this study are especially the self-
reporting nature of online surveys and the use of a single sample of malware. The study reflects
the broad cross-section of the user background with other types of malware and IoT devices.
Ruchi Vishwakarma et al. [92] demonstrated a honeypot-based approach that uses machine
learning methods to detect malware. The data machine-generated by IoT honeypot is used as a
dataset for effective and dynamic training of learning models. However, this approach needs to
be implemented in the next step, where real problems or concerns can be defined by applying
them in real-time scenarios. It has the ability to use cloud servers to manage highly resource-
restricted IoT phones.
Mingyang Yin et al. [93] suggested a non-markovian spread dynamics model that could describe
Botnet spread e-projects as a state of hybrid infection. Based on the suspended-received-
recovered method, they have implemented an unnecessary memory diffusion approach for global
spread as a tuner to change the diffusion rate of the scatter. With the role of memory, this method
can support different spreadsheets by introducing a hybrid propagation approach and scope
controllers. Still it simplifies the life conditions and nodes of the bot. Immune symptoms are not
taken into account.
Vitor Hugo Bezerra et al. [94] suggested a host-based approach to detect Botnet through IoT
phones, called IoTDS (Internet of Things Detection System). This system relies on single-class
classifiers, which model only valid system actions to detect of deviations further, avoiding the
manual labeling process. Such experiments have shown that the solution improves the CPU,
23
memory, and energy use of the computer, but we have not found any issues with the operation of
the device.
MortezaSafai Pour et al. [95] search for macro, passive empirical data to shed light on this
emerging threat phenomenon. By looking at this one-way network traffic, it attempts to identify
and terminate compromised IoT devices on the Internet and detects, monitors, and reports
coordinated IoT Botnet. However, we will try to overcome some of the shortcomings of current
research, for example, misidentifying two different IoT Botnets, which may have improved
labeling systems and may have the same feature.
Yan Naung Soe et al. [96] explain that the Botnet attacks are the most recent attack on the IoT
environment. It is needed to protect the IoT devices from these kinds of attacks. However, there
are challenging to implement the attack detection system on IoT devices because they have
minimal resources. Although anomaly-detection architecture has unknown attack detection
capability, it isn’t easy to get an effective system for all devices because of the different
architecture of IoT devices.
Nicholas Coroniotis et al. [97] reviewed the forensic and intensive learning mechanisms used to
examine Botnet and present them in IoT environments. In addition to the classification of
network forensic solutions developed for traditional IoT environments, they provide a new
definition of IoT. However, the lack of development and improvement of honeypot and network
flow analysis, dealing with the high speed and a large amount of data generated by IoT and any
solutions developed that sounded forensic and the results produced were acceptable to the court.
Qaisar Shaf et al. [98] focus on the design of an Internet of Things (IoT) Botnet prevention
program which supports both Software-Defined Networking (SDN) and Distributed Blockchain
(DBC). IoT communication, the in-band channel is extended by Generic Routing Encapsulation
(GRE) tunnels between network switches running inside each mininet instance of a single VM.
Thomas Lange and Houssain Kettani [99] summarized Botnet evolution, patterns, and
mitigation. They provided relevant examples and analysis to provide the reader with quick access
to a broad understanding of the issues at hand. It is, therefore, well suited for situations involving
Botnet where the processing of data sets is a problem and concatenation is not advisable.
R. K. Malaiya et al. [100] proposed and empirically proposed a new network-based anomaly
24
detection method that captures behavioral snapshots from the network and detects network traffic
from compromised IoT devices. Depending on this, its general behavior becomes more
challenging to comprehend, and thus, subsequent conclusions may be subject to other taxonomic
errors.
Scope of work and Objectives
Following extracts from the research in this domain, it has been identified that the current
methods that deal with Botnet detection work lead to time and memory constraints. In this sense,
previously proposed approaches deal with network data in a stream setting. When evaluating
their approach, they achieved high host identification rates. However, their solutions do not suite
with an approach that identify malicious Botnet flows and requires a high number of flows to
perform the detection task.
Based on related research surveyed, Botnet detection approaches can be host-based or network-
based. In Constraint devices, we can’t dump our Botnet or Anomaly detector. Host-based Botnet
or anomaly detector algorithms consume device efficiency and power of IoT Devices. The
proposed approach can be used as a host-based or network-based Botnet or Anomaly detector
suitable for organizations for a single non-distributed network.
Based on related research surveyed, it is concluded that further improvement is required for the
detection process of Botnet in the IoT-based network. The IoT-based networks need
improvement in detecting and removing Botnet attacks with high accuracy. Prior systems fail
because of its poor traffic predictability, which is experienced in the literature survey. In
addition, it creates memory and time complexities.
Hence it is essential to develop novel frameworks for Botnet detection with improved prediction
accuracy.
The Main objective of this thesis is to detect and cluster attacking nodes to help in the mass
removal of Botnet attacks. In the proposed work, three frameworks experimented with the real
testbed.
1. Mass Removal of Botnet Attacks using Heterogeneous Ensemble Stacking PROSIMA
Classifier in IoT.
2. Isolating Botnet Attacks Using Bootstrap Aggregation Suflex-PSIM Classifier in IoT.
3. A Novel Forecastive Anomaly Based Botnet Revelation Framework for Competing Concerns
in Internet of Things
25
CHAPTER 3
Mass Removal of Botnet Attacks Using Heterogeneous
Ensemble Stacking PROSIMA Classifier in IoT: First
Approach
3.1 Heterogeneous Ensemble stacking PROSIMA classifier
In the IoT environment, the Botnet attack is carried out by compromised nodes, so it is
challenging to detect Botnet compromised nodes. In the proposed approach data is collected
from the different sensor nodes, and unwanted data is removed during the preprocessing stage.
The preprocessed data are used for training in heterogeneous ensemble stacking classifiers. In
phase two of the proposed classifier, again, a random forest algorithm is used as a meta-
classifier. In the testing phase, a similar Botnet would be clustered by the PROSIMA protein
sequence similarity algorithm. Figure 3.1 shows the proposed heterogeneous ensemble stacking
PROSIMA classifier. Figure 3.2 shows the overall structure of the network scenario under
consideration. In this proposed approach, the classifier is used at the gateway, but it can use in
the node if the node is capable enough to carry a pre-trained model of the classifier.
FIGURE 3.1 Heterogeneous Ensemble stacking PROSIMA classifier
26
FIGURE 3.2 Overall Architecture of the IoT network with a Botnet attack
3.2 Data Collection
In the experimental setup, each IoT node is connected with a sensor node Sn. Data are generated
at every sensor node Sn= (S0, S1,…, Sn). The collections of resources are identified as R= {IoT1,
IoT2, IoT3,…, IoTn}. Since data is collected from the sensor nodes, it includes raw data along
with network traffic. Data preprocessing is required to remove unwanted data and redundant
information.
27
3.3 Preprocessing
The data packets arrived are captured with the help of Wireshark in the form of a ‘pcap’ file.
With the help of the Tshark command, the ‘pcap’ file is converted to a CSV file. The features
required to analyze the packets for the response of Botnet attack are derived from CSV files. To
detect a Botnet, only network traffic information would be required. So it is necessary to remove
unwanted information like sensor data. Sensors’ data are removed, and only network traffic flow
is retained in the feature set. Total 21 features like packet arrival time, source address,
destination address, transport layer protocol, packet length, etc., are collected with 2, 25,745
instances. Of these 1, 83,910 instances belong to no attack class and 41,835 instances belong to
DDoS and Spam Botnet class. XGBoost, Adaboost, and Random forest machine learning
algorithms were used to avoid value scaling.
3.4 Feature Selection
Feature selection finding the most relevant features from the available feature set for a classifier
model. These techniques are accustomed to establishing and taking away needless, tangential,
and redundant options that don't contribute to or decrease the model’s accuracy. The most
powerful technique would be a genetic algorithm. After the preprocessing stage, a genetic
algorithm is used to identify relevant features selection to visualize the data and reduce
processing time further for the classification stage. The first step is to form and initialize the
individual within the population. Because the genetic algorithmic program may be a random
improvement technique, the genes of the people area unit are sometimes initialized haphazardly.
The second stage would be assigned fitness value to each individual. The model is trained with
the entire training dataset to evaluate the fitness. Fitness values would be assigned by rank-based
method.
The fitness value is assigned to individuals using rank based method as following:
∅��� = � ∗ ����� = 1, … , (1)
Here k is constant and also called selective pressure. Its value is fixed between 1 and 2. In the
proposed work this value is selected to 1 as per the literature of the Genetic Algorithm. Greater
selective pressure values can create the fittest individual to own a lot of chances of
recombination. The parameter R (i) is the rank of individual ‘i’.
28
���� = ��������∗����� (2)
Once the fitness assignment is performed, the choice operator chooses the individual that may
recombine for the following generation. Therefore, the selection operator selects the individual in
step with a fitness level for the next crossover. Next, the GA can determine how bits are swapped
among the tries. After receiving the fitness value, feature selection is performed using Mod-
Dejong on our dataset. Mod-Dejong gives 4 features, which would be utilized to train the
proposed algorithm.
3.5 Proposed Model
3.5.1 Popular ways to combine different classifiers
There are classifiers which are showing results to identify the presence of Botnet attack with
different methodologies. Popular approaches in which different classifiers can be combined are
bagging, boosting, and voting. This is also referred to as ensemble learning. Bagging, boosting,
and voting would be the popular way of combining totally different classifiers and training them
on a random subset of the data called ensemble learning [6]. One of the examples of bagging is
the random forest. Boosting which is very similar to bagging but here in bagging previous bag
errors is taken into consideration. One of the examples of boosting is adaboost. Bagging is better
than boosting. Boosting can lead to overfitting in the classifier where the model works better on
the training data set but fails to detect the attack on unknown data. There are two main
techniques to combine the model, voting and stacking. In voting, the class is predicted as a
majority vote from the different classifiers. The stacking classifier is discussed in the next
section.
3.5.2 Popular ways to combine different classifiers
The main advantages of using stacking classifiers are the products of the base-level classification
field unit accustomed to the meta classification train. The goal of this next level is to determine
the learning process. For example, if the taxonomy constantly loses field due to misinterpretation
of the feature area of that area, the meta-classifier may be ready to identify this negative aspect.
It improves learning errors by highlighting the learned behaviors of alternative classifiers.
Stacking is the process of combining different classifiers CL1, CL2, ..., CLn on the single
dataset. It is a two steps process. In the first step, a set of base classifiers BC1, BC2,…, BCn is
29
used. In the second step, a meta-classifier is used which performs predictions on a newly
constructed dataset.
3.5.3 Overall Architecture of Heterogeneous Ensemble Stacking Meta-classifier
In the proposed heterogeneous ensemble stacking meta-classifier, XGboost, AdaBoost, and
random forest heterogeneous classifiers are used. Again random forest classifier is used as a
meta-level classifier. During the testing phase, similar Botnets are clustered using the PROSIMA
algorithm.
3.5.3.1 XGBoost (Extreme Gradient Boosting) Algorithm
XGBoost is an associate algorithmic program that has recently been identified as dominating
applied machine learning and Kaggle competitions for generating structured or tabular
information. XGBoost is an associated implementation of gradient boosted call trees designed
for achieving higher amounts of speed and performance simultaneously. The sweetness of this
powerful algorithmic program lies in its measurability that drives quick learning through parallel
and distributed computing and offers economical memory usage.
3.5.3.2 Adaboost (Adaptive Boosting) Algorithm
Adaboost is a preferred algorithm to boost the performance of call trees on binary classification
issues. It is stated as a distinct AdaBoost, a result of its use for classification instead of
regression. It is best used with weak learners.
Algorithm: Stacking Classifier
1: Input: Training data � = ���, ��� ����
2: output: ensemble classifier E
3: Step 1: Learn base-level classifiers
4: for t=1 to T do
5: learn ht based on D
6: end for
7: Step 2: construct new data set of predictions
8: for i = 1 to m do
9: Dh = {xi, yi} where xi’ = {h1 (xi)... hT (xi)}
10: end for
11: Step 3: learn a meta-classifier
12: learn E based on Dh
13. Return E
30
3.5.3.3 Random cluster sampling forest Algorithm
Random forest builds multiple decision trees and merges them along to induce an additional
correct and stable prediction. Here is the algorithmic rule for random forest algorithms. Due to
its performance and accuracy, the random forest is used as a base classifier as a meta-classifier.
The conventional random forest takes less time to train but more time for predictions because
large numbers of trees would slow down the algorithm’s performance. So cluster sampling is
adopted in place of random sampling in the meta-classifier stage to speed up the prediction
process. Figure 3.3 shows the training process of the random clustering forest.
Prediction of the unseen sample using random forest is defined as:
∑−
=T
t
t sEFT
F
1
' ))((1
(3)
F’ indicates the prediction of all the unseen samples and Ft indicates the time period for
observation. E(s) represents the Poisson distribution of the trained data set which reduces the
time of training. The bagging process repeatedly (T times) selected the random sample from the
training dataset.
The primary significance of this (random forest) model is that instead of finding the simplest
feature when it is a half hub, it randomly scans the simplest feature in associates in the nursing
set of random features. This makes the process the best model. Figure 3.4 shows the process flow
of the heterogeneous ensemble stacking meta-classifier. Figure 3.5 shows the flow of the
proposed system. Figure 3.6 shows the clustering of Botnet using PROSIMA.
31
FIGURE 3.3 Training process with random clustering sampling forest Algorithm
Begin
For each tree T
Chose training data Subset
Check
condition
at node?
Apply Poisson
distribution
Build the next split
Calculate prediction
error
End
32
FIGURE 3.4 Process flow of heterogeneous ensemble stacking meta classifier
3.6 Mass clustering based on PROSIMA protein similarity
All the similar Botnet having repetitive structures are clustered by the PROSIMA protein
similarity algorithm. The output of the training phase eq (3) is clustered in the testing phase.
We use m different terms t1, t2…..tm for indexing N features. Then each observation Oi is
represented by a vector:
�� = ���1, ��2, ��3,… , �� � (4)
where Oij is the weight of the term tj in the observation di.
An index file of the vector model is represented by matrix:
D= �11012 …�1"�21�22…�2" ⋮⋮⋱⋮ � 1� 2…�%"& (5)
where ith
row matches ith
observation and jth
columns matches’ jth
term. The similarity of two
observations is given by the following formula,
'�"���, �(� = ∑ �*��*+��,-./0∑ �*���1,-./ 0∑ �*+��1,-./
(6)
1. Input generalized suffix tree data structure from meta-level classifier
2. Find all maximal substructure clusters within the suffix tree.
3. Build a vector model of all pockets in our assortment
4. Build pocket similarity matrix
33
FIGURE 3.5 Process flow of the proposed system
FIGURE 3.6 Flow diagrams for clustering of Botnet using PROSIMA
List the devices with
their activities
Compute similarity matrix
Set device as a cluster
Number
cluster =1
Update a similarity matrix
Merge two similar devices
End
Yes
NoNo
Begin
34
3.7 Experimental Setup
In experimentation, two kinds of attacks are considered. The DDoS attack and spam attack,
DDoS attack may be a digital attack during which the attacker tries to make a machine or system
inaccessible by incidentally or inconclusively distressful administrations of a bunch related to the
Internet.
Email spam contains unsolicited messages, often by random business entities. Spam can be a real
security issue to expose trojan stallions, infections, worms, spyware, and targeted phishing
attacks. In a normal attack, a single attacker tries to disrupt the network. In Botnet attack, the
number of malicious nodes called bots attempts to attack the target system as each connected
node is affected.
The proposed method is evaluated with the experimental setup. The traffic is collected from 20
IoT real nodes (implemented with Raspberry pi 3) connected via the Wi-Fi network to the access
point and wired connection to the central switch and the router. Using Tshark and Wireshark the
network traffic is sniffed, port mirroring on the switch has been utilized for sniffing. C & C
server has been achieved using a python script to send the file and control IoT devices. Three IoT
devices are configured as bots to generate DDoS and spam attacks to the rest of the devices in
the network. Twenty-one features have been extracted from 5-time windows each of 1.5 ms, 10
ms, 50 ms, 100 ms, and 500 ms, respectively. Using python script and Tshark commands, packet
delivery ratio, packet loss, and throughput, packet arrival time is computed as the number of
received/sent packets. Arrival time is computed as described in 3.7.2. Figure 3.7 shows the
experimental setup for detecting Botnet attacks.
In the proposed work, IoT devices are infected using created DDoS and Spam attacks. To send
the DDoS and Spam attacker script on the IoT devices (Raspberry pi 3), brute-forcing is carried
out on the Telnet port. Required python scripts are created using python Scapy. Under the
influence of attack, the IoT devices started generating DDoS and Spam attacks for the rest of the
devices available in the network. The result of one such experiment is shown in table 3.2. The
traffic data collected for the experimental setup has been further utilized to evaluate the proposed
classifier’s performance evaluation.
35
FIGURE 3.7 Experimental setup for detecting Botnet attacks
When the number of devices in the IoT ecosystem increases due to its technical complexity, the
current systems have skipped more of the Botnet. This type of attack is, therefore, very complex
and challenging to identify. A heterogeneous ensemble stacking PROSIMA classifier is proposed
to identify a Botnet attack, which takes advantage of cluster sampling instead of traditional
random sampling to make predictions more accurate. Thus, this technique achieves more
reliability of the IoT-based network over Distributed Denial of Service (DDOS) and spam
Botnet. In the Botnet attack group, smart objects would come together and execute an action that
would lead to the destruction of the IoT-based system, so early elimination of the Botnet would
help maintain the network’s security.
3.8 Results and Discussion
The proposed Isolating Botnet attacking using heterogonous ensemble PROSIMA classifier is
implemented in Anaconda’s spyder software using python version 3.6. Python is the most
powerful scripting language developed by Guido Van Rossum in 1989 in the Netherlands, but it
has gained momentum in the last decade. The main advantage of using Python is that Python’s
standard open-source libraries are enormous, and you can find almost all the functions needed
for your task. T Python’s machine learning libraries like NumPy, Pandas, Matplotlib, and
Sklearn. Python’s Tkinter library is used to create GUI.
The proposed system for IoT based network is implemented using the python programming
language. Figure 3.8 shows the IoT based network implemented using python.
FIGURE 3.8
3.8.1 Calculation of Packet arrival time
Since smart objects are involved in the IoT
unauthorized users can quickly access
false information that affects the working of the IoT node. In
out malicious activities by forming
and inter-arrival time of each smart object which is involved in the IoT based network,
suspicious users can be listed and the monit
Let be a process with rate
So ~ Exponential , Let
And let and
=
36
roposed system for IoT based network is implemented using the python programming
shows the IoT based network implemented using python.
FIGURE 3.8 The IoT based network
Calculation of Packet arrival time
Since smart objects are involved in the IoT-based network, in the absence of security
access the network and IoT node resources and
false information that affects the working of the IoT node. In a Botnet attack, smart objects carry
malicious activities by forming groups among each other, so by keeping track
arrival time of each smart object which is involved in the IoT based network,
suspicious users can be listed and the monitoring process would be executed on those users.
. Let be the time of the first arrival, then
(1)
be the time interval between the first and second arrival
, two intervals are independent
(2)
roposed system for IoT based network is implemented using the python programming
absence of security,
the network and IoT node resources and hence distribute
ck, smart objects carry
keeping track of the arrival
arrival time of each smart object which is involved in the IoT based network,
oring process would be executed on those users.
be the time interval between the first and second arrival
are independent
If be a process with rate , then the inter
~ Exponential , for i=1, 2…
is the sum of independent exponential
The Probability Density Function of
(3)
If Exponential , then
Since it is concluded that the arrival time of the Poisson distribution is
calculated by
(4)
In IoT based network, the arrival
arrival time.
Indicates the mean arrival time of each user in the Io
37
, then the inter-arrival times are independent
, for i=1, 2…
independent exponential random variables then:
Probability Density Function of for n=1, 2, 3…
it is concluded that the arrival time of the Poisson distribution is
(4)
(5)
the arrival time of each smart object would be calculated based on the
Indicates the mean arrival time of each user in the IoT based network.
are independent
it is concluded that the arrival time of the Poisson distribution is
(5)
time of each smart object would be calculated based on the
3.8.2 Packet Delivery Ratio (PDR)
The estimate of the Packet Delivery R
packets (Size of 1 packet is 40 bytes)
classified as the ratio between the
bundles produced by the source. Figure 3.9
intervals.
FIGURE
3.8.3 Packet Loss
Packet loss occurs after at least one packet of the network fails to reach its target. Packet loss is
calculated as the range of the packet lost
packet loss ratio over normal and attack duration.
FIGURE 3.10 Network
38
Packet Delivery Ratio (PDR)
The estimate of the Packet Delivery Ratio (PDR) is based on the number of
(Size of 1 packet is 40 bytes)) recorded in the trace document. Overall, the PDR is
classified as the ratio between the number of bundles received by the target and the number of
produced by the source. Figure 3.9 shows the packet delivery ratio for normal and attack
FIGURE 3.9 PDR during normal and attack period
Packet loss occurs after at least one packet of the network fails to reach its target. Packet loss is
calculated as the range of the packet lost in terms of the packet. Figure 3.10 shows the network's
packet loss ratio over normal and attack duration.
etwork packet loss ratio over normal and attack duration
number of bundles (set of
in the trace document. Overall, the PDR is
ed by the target and the number of
shows the packet delivery ratio for normal and attack
Packet loss occurs after at least one packet of the network fails to reach its target. Packet loss is
shows the network's
over normal and attack duration
3.8.4 Throughput
In data transmission, information is transferred from the supply node to the destination during
the throughput nominal period and is usually m
network's throughput over normal and attack duration.
FIGURE 3.11 Network
3.8.5 Clustering of Botnet of DDoS and
In a DDoS type Botnet attack, the attacker sends a request for a resource to a specific destination
address for a while so that users cannot authentic
proposed classification packet maintains cluster
destination address, and the requested resource. A
source nodes and the destination nodes to speed up the attacks.
Spam Botnet sends the email to the spam box instead o
application. This includes giving out unwanted messages. Spam tram stallions ar
focusing on infections, worms, spyware, and phishing attacks.
In existing methods, hierarchical clusters and K
of hierarchical groups is that if the two groups are together, it cannot be deferred
means is necessary to find the k
in the proposed system composite model, which is similarity
39
In data transmission, information is transferred from the supply node to the destination during
the throughput nominal period and is usually measured in bits per second. Figure 3.11
network's throughput over normal and attack duration.
Network throughputs over normal and attack duration
ering of Botnet of DDoS and Spam types
In a DDoS type Botnet attack, the attacker sends a request for a resource to a specific destination
address for a while so that users cannot authenticate that resource for a particular
proposed classification packet maintains cluster nodes that support allocating
the requested resource. Also, it calculates the distance between the
source nodes and the destination nodes to speed up the attacks.
to the spam box instead of sending it to the inbox of the mail
application. This includes giving out unwanted messages. Spam tram stallions ar
spyware, and phishing attacks.
In existing methods, hierarchical clusters and K-means clustering were used. The main drawback
of hierarchical groups is that if the two groups are together, it cannot be deferred
necessary to find the k-values before the algorithm is implemented. Clustering is used
composite model, which is similarity-based clustering with a higher
In data transmission, information is transferred from the supply node to the destination during
. Figure 3.11 shows the
over normal and attack duration
In a DDoS type Botnet attack, the attacker sends a request for a resource to a specific destination
ate that resource for a particular period. The
at support allocating sent time, a
calculates the distance between the
f sending it to the inbox of the mail
application. This includes giving out unwanted messages. Spam tram stallions are a real security
clustering were used. The main drawback
of hierarchical groups is that if the two groups are together, it cannot be deferred, and the k-
before the algorithm is implemented. Clustering is used
based clustering with a higher
40
clustering ratio than the current system. Mixed model clustering can handle how many cluster
shapes. Figure 3.2 shows the clustering of Botnet attacks leading to a DDoS attack and a Spam.
TABLE 3.1 Log details of each smart node in the network
Node IP Address Arrival time
(Sec)
Packet Delivery
ratio
Packet Loss
(Kbps)
Throughput
(Kbps)
n1 151.142.255.1 2.256 88.025 2.2835 56.895
n2 151.142.255.2 1.267 93.211 1.4756 54.742
n3 151.142.255.3 8.278 94.723 1.8629 55.315
n4 151.142.255.4 1.289 94.601 1.8687 56.889
n5 151.142.255.5 1.314 94.783 1.4756 53.895
n6 151.142.255.6 5.311 89.5404 1.8629 56.888
n7 151.142.255.7 4.322 93.031 2.1905 53.895
n8 151.142.255.8 3.333 95.5216 2.1905 51.2
n9 151.142.255.9 2.344 88.0122 1.4756 60.235
n10 151.142.255.10 1.355 90.5028 2.1905 53.895
n11 151.142.255.11 1.366 92.9934 1.8629 51.2
n12 151.142.255.12 9.377 95.484 2.1905 51.2
n13 151.142.255.13 8.388 87.9746 2.2835 51.2
n14 151.142.255.14 7.399 91.4652 1.9597 51.895
n15 151.142.255.15 6.441 92.9558 1.8629 50.96
n16 151.142.255.16 5.421 89.937 1.8629 53.895
n17 151.142.255.17 4.432 90.4276 1.57717 56.889
n18 151.142.255.18 3.443 92.9182 1.9598 56.888
n19 151.142.255.19 2.454 95.4088 1.8629 51.221
n20 151.142.255.20 1.465 89.8994 1.9598 56.38
41
TABLE 3.2 DDoS and SPAM Botnet attack clustered Nodes
Node Source IP
Address
Packet Sending
Time (Sec)
Destination IP
Address
Resource
n1 151.142.255.1 0.214 151.142.250.11 file-1
n2 151.142.255.2 0.214 151.142.250.11 file-1
n3 151.142.255.3 0.214 151.142.250.11 file-1
n4 151.142.255.4 0.214 151.142.250.11 file-1
n5 151.142.255.5 0.214 151.142.250.11 file-1
n6 151.142.255.6 0.214 151.142.250.11 file-1
n7 151.142.255.7 0.214 151.142.250.11 file-1
n8 151.142.255.8 0.114 151.142.255.11 mail
n9 151.142.255.8 0.114 151.142.255.11 mail
n10 151.142.255.10 0.114 151.142.255.11 mail
n11 151.142.255.11 0.214 151.142.250.11 file-1
n12 151.142.255.12 0.214 151.142.250.11 file-1
n13 151.142.255.13 0.214 151.142.250.11 file-1
FIGURE 3.12 Clustering of Botnet attack which leads to DDoS and Spam attacks
42
As shown in Figure 3.12 seven nodes are clustered under DDoS attack (pink colored) and three
nodes are clustered under Spam attack (red colored).
3.8.6 Comparing proposed classifier with existing classifiers
In this section, the proposed classifier is compared with existing classifiers in terms of different
parameters as shown in TABLE 3.3.
TABLE 3.3 (a) List of classifiers with the proposed system
Classifiers Precision Recall F-Measure Accuracy
IoTDS [30] 0.968 0.931 0.949 96.5333
BoTshark
[18]
0.968 0.934 0.95 96.667
Proposed 0.971 0.963 0.966 98.63
TABLE 3.3 (b) List of classifiers with the proposed system
Classifiers Precision Recall F-Measure Accuracy
Decision Tree 0.968 0.931 0.949 96.53
Random
Forest [29]
0.968 0.934 0.95 96.66
RBF 0.976 0.927 0.95 96.53
Proposed 0.971 0.963 0.966 98.63
43
Precision
Precision is revealed in the fraction of the test part of the data as the attack is literally from the
attack categories.
Figure 3.13 shows a comparison graph for precision for four types of classifiers.
FPTP
TPprecision
+=
Where TP represents the True Positive value, FP indicates the False Positive.
FIGURE 3.13 Comparison graph for Precision
The proposed classifier achieved an optimum precision value of 0.97. Comparatively, precision
value is better than existing classifiers since meta-classifier have adapted a cluster-based
sampling approach, which first finds similar elements, and then splitting is performed.
Recall
Recall measures the fraction of attack class that was correctly detected as Botnet.
Figure 3.14 shows a comparison graph for recall.
FNTP
TPcall
+=Re
Where TP represents the True Positive value, FP indicates the False Positive.
44
FIGURE 3.14 Comparison graph for Recall
The proposed classifier achieved a better recall value of 0.96 than other existing classifiers like
Decision tree, random forest, and RBF with precision values 0.93, 0.93, and 0.92. The proposed
system has utilized similarity-based clustering. So, it separates the event successfully.
F-Measure
F-measure can measure the test accuracy. It is a measurement of balance between precision and
recall. Figure 3.15 shows a comparison graph for F-measure.
RP
RPmeasureF
+=−
**2
Where P represents the precision and R denotes the recall value.
45
FIGURE 3.15 Comparison graph for F-measure
The proposed system has utilized cluster-based sampling in the training phase. It first clusters out
a similar event before performing splitting the observation for decision tree creation. So, it
achieved a better F-Measure compared to existing classifiers.
Accuracy
Accuracy is that the portion of predictions our model got right. Formally, accuracy can be
defined as,
TB
BIAAccuracy c=)(
Where IcB indicates the correctly identified Botnet attack, TB denotes the total number of Botnet
attack.
The proposed classifier has utilized top-class base classifiers at the first phase and a Meta
classifier with cluster-based sampling at the second stage. Then similar Botnet would be
clustered by PROSIMA based on equal pocket value.
So proposed classifier has qualified higher accuracy of 98.63 than existing classifiers Decision
Tree, Random forest, and RBF had 96.53, 96.66, and 96.53.
46
FIGURE 3.16 Comparison graph for Accuracy
The proposed classifier has utilized powerful base classifiers at the first phase and meta classifier
with cluster-based sampling at the second stage. Then similar Botnet would be clustered by
PROSIMA based on similar pocket value.
The proposed classifier has qualified higher accuracy of 98.63% compared to existing classifiers
decision tree, random forest and RBF had 96.53%, 96.66%, and 96.53%. In this first approach
time taken for prediction is quite high due to many classifiers have been used for classification
and also to improve classification accuracy second approach has proposed in chapter 4.
47
CHAPTER 4
Isolating Botnet Attack Using Bootstrap Aggregation
Surflex-PSIM Classifier in IoT:
Second Approach
4.1 Seclusion of Botnet attacks using PSIM based on random Poisson forest
model
Botnet attacks are carried out by a group of compromised nodes Thus, making it difficult to spot
out by the conventional methods. Henceforth, the proposed system has used a learning-based
classifier to trace and cluster the Botnet attack. Initially, since data stored is gathered from a
sensor network, it includes both linear and nonlinear data. In order to remove unwanted data,
effective preprocessing techniques are required. In the proposed system, Linear Random Euler
Complex-valued Filters (LRECF) which linearize the dataset by using Euler distance valued
filtering. Consequently, the preprocessed linearized data sets are trained by the random Poisson
forest algorithm which applies the general bootstrap aggregation technique, repeatedly selecting
a random sample with replacements of the training sets for a given time. Subsequently, based on
the trained data, the similar Botnets are clustered using Surflex-PSIM, which isolates the Botnet
attacks as clusters based on automatic trained characteristics of attacks. Even a large dataset
where subjected as input, yields accurate clustering. The timing for getting rid of individual
analysis of Botnet removal can be avoided such that accurate and less time-consuming Botnet
detection can be achieved.
4.2 Data gathering phase
In this IoT Based approach, data are gathered using sensor nodes S= (S1, S2,…, Sn). Then the
collected resources are defined as G= {IoT1, IoT2, IoT3,…, IoTn} since data is collected from the
sensor nodes, it includes raw data also. Data preprocessing techniques are used to remove
unwanted data. Since sensor nodes deliver real-time information, linear filtering is adapted to
preprocess the data.
48
FIGURE 4.1 Overall architecture of the IoT network with Botnet attacks
4.3 Removing complex-valued variable
Since the IoT environment is based on context-aware computing and also different activities
carried out by Botnet. Data sensed by sensor nodes have complex-valued variables. In order to
remove the non-linearized data, the whole dataset obtained are converted in the origin of the time
axis and there arises a non-linearity while converting to the time axis which rectified by using
Linear Random Euler Complex-valued Filter (LRECF), which linearizes the dataset by using
Euler distance valued filtering and prevent the features of Botnet from the exhaust.
49
A complex-valued variable C is defined as
IR CCC += (1)
Where )(GSC = , RC and IC are the real and imaginary parts of C and 1−=i is the
imaginary unity. The probability density function of complex valued random variable would be
defined by the joint probability density function of its real and imaginary parts respectively.
),()( IR CCpCp = (2)
The expectation of the complex-valued random variable is defined as
)()()( IR CiECECE += (3)
A random variable which is complex-valued would be said to be zero mean when the real and
imaginary parts are zero
0)()( == IR CECE (4)
FIGURE 4.2 Complex valued linear filtering
In filtering system, pair of samples su and
sv from S where )(cES = is given for training and a
set of errors is denoted by sss yve −= .Wheresy indicates the expected output. The cost
50
function used for filtering is defined as ))(( *ss eeE .Weight vectors of the learning system
would be updated based on the minimization of mean square error and the complex gradient
descent method. Initially the cost function would be 0. Then when the second variable enters the
learning model, cost function would be calculated based on MSE that is ))(( *ss eeE . Then the
weights gets updated using equation (5) similarly the cost function would be calculated for every
instance and weights (w) gets updated simultaneously. l represents current state of the complex
learning system.
))(()()1( ss uleElwlw η+=+ (5)
The probability density function of a random variable which is complex-valued is given as
),()( IR eePeP = (6)
The entropy of this error data which is complex-valued is defined as
(7)
From eqn (7) data with the least entropy error would be passed to the training phase. Data
(r1…..rn) with the least value of entropy error would be chosen.
FIGURE 4.3 Process flow of the proposed system
{ }),(log),()( IRIR eepEeeHeH −==
51
4.4 Bootstrap Aggregating Surflex-PSIM Classifier
The linear data obtained from the filter here is subject to the classification of the training phase
and the test phase. The training phase will be a random Poisson forest model with trees and
combining them to get an accurate estimate. The determination of each internal node represents a
test on the tree attribute. In the decision tree, each branch shows the test result. If the node does
not have children, that node is called the leaf node. Each leaf node in the decision tree displays
the class label. The main importance of this model is that instead of hunting for the best feature
during the hub, it scans for the best feature in random features. This process produces a good
variety, most likely a good sample. Traditional random forest algorithms take less time to train
but take longer to model, so the model slows down due to a large number of decision trees. To
speed up the entire process of random forest sampling, the Poisson distribution function is
utilized. The test phase consists of a PSIM, which automatically separates Botnet attacks into
training groups based on the surface properties of the pocket value.
FIGURE 4.4 Random Poisson forest classifier
4.4.1 Random training model based on Poisson distribution
Random Poisson forest counts the number of events and the time that these events occur in a
given time interval, so it achieved a better prediction rate during the training phase. The
algorithm for training applies to the general technique of bootstrap aggregating. Initially, the
linearized data set R=r1,……………r
s1,s2………………sn would be subjected to a bagging process which repea
sample with replacements of the training sets for a given time set by Poisson distribution. The
linear regression for the trained data set is defined as
nS
Where b indicates the regression coefficient,
input variable. In order to predict the unseen samples in the data set, Poisson distribution is applied
which speeds up the prediction process.
[ ])(log SE =
Where log (t) represents the offset variable since
represents the observed time period.
observing i events over the time period is defined as
Let be the expected value (average) of S and e denotes exponential
Then taking the average for all the predictions from an individual regression tree
Where F’ indicates the prediction of all the
observation. E(s) represents the
of training.
52
,……………rn, which is obtained from eq (8) with responses S=
would be subjected to a bagging process which repeatedly selects a random
sample with replacements of the training sets for a given time set by Poisson distribution. The
linear regression for the trained data set is defined as
nniin rbrbrbb .................210 +++= (8)
indicates the regression coefficient, S represents the trained data set and
input variable. In order to predict the unseen samples in the data set, Poisson distribution is applied
which speeds up the prediction process.
)log(.........22110 trbrbrbb nn ++++= (9)
ents the offset variable since Poisson regression uses fixed time and
represents the observed time period. If S follow a Poisson distribution then the probability of
observing i events over the time period is defined as
!)(
i
eiSp
λλ −==
(10)
(average) of S and e denotes exponential
Then taking the average for all the predictions from an individual regression tree
∑−
=T
t
t sEFT
F1
' ))((1
(11)
Where F’ indicates the prediction of all the unseen samples, T indicates the time period for
E(s) represents the Poisson distribution of training data set which reduces the time
(8) with responses S=
tedly selects a random
sample with replacements of the training sets for a given time set by Poisson distribution. The
represents the trained data set and r represents the
input variable. In order to predict the unseen samples in the data set, Poisson distribution is applied
regression uses fixed time and t
distribution then the probability of
Then taking the average for all the predictions from an individual regression tree
unseen samples, T indicates the time period for
distribution of training data set which reduces the time
53
4.5 Pseudo code for random Poisson forest
Input: Training sample S , classifier F , Iteration I
Output: '
F
Training: sets the weightage value m
iS Sample from S according to the Poisson distribution
yi Number of data samples
iFTrain a classifier iS
on via F
∑≠∈
=
iyiriii Fsr
ii rweightm
e
)(:
)(1
iiii rrweightrweight ∀= ,)()( β
iii yrF =)(
∑=
=
yxFii
i
F
)(:
' )1log(β
i
ii
e
e
−=
1β
54
FIGURE 4.5 Training process with random Poisson forest
Begin
For each tree T
Chose training data Subset
Check
condition
at node?
Apply Poisson
distribution
Build the next split
Calculate prediction
error
End
55
4.5.1 Mass clustering based on P-SIM clustering
After being trained, similar Botnet would be clustered using surflex-PSIM utilizing its repetitive
structure, which isolates the Botnet attacks as clusters based on automatic trained characteristics
pocket value.
The output of the training phase from eq (11) would be used for clustering in the testing phase
based on the similarity of Botnet behavior. The main idea behind this approach is to cluster the
similar type of Botnet among the authenticated smart objects which are involved in the IoT based
network. Below mentioned formulae are used to find the similarity of various Botnet.
( ) ( )vuvuvuvuvuVU MMCMMC ,,,,, '''' ≠∩∈∀= (12)
( ) ( ) ( ) LMvvUuuMM vuvuvu≥⊄⊄⇒≠ ,
''
,, ''
(13)
Let U and V be two network parameters which are belonging to the IoT family F. Let u and v are
two identical subsequence belonging to U and V respectively. Mu,v is to represent the matched
subsequence of surflex characteristics such as u and v and L represent the minimum length that
this similarity should have. Cu,v is defined by the key set of matched parameter values Mu,v for the
similarity function.
The matching set Cu,v include all the matched subsequence of maximum length between the
sequence u and v. ⊄ Indicates that the one type of botnets is not included in another cluster. All
possible matched parameter values should satisfy LM vu ≥, since each Mu,v in Cu,vis an
expansion of matched parameters of length L. Therefore, these approaches gather all the matched
network parameter values of length L in linear time. Then weightage value would be given to all
matched parameter values to make difference among all other authenticated users.
[ ] [ ][ ]∑=
=
M
i
jMiMTMW
1
,)( (14)
Where M[i] is the ith
Botnet of the matched parameter value M and M[i], M[j] is the weightage
value of each Botnet in the network. T represents the substitution matrix. For the pair of
parameter values U and V, matching score Su, v would be defined as
56
( )vuMAX
CMS
vuvu
,
,,
⊂=
(15)
Let Smax be the matching score of the largest network parameter value belonging to the IoT
supported network. The maximum of matching score value is defined by
{ }{ }FVvuSS vu ⊂== ;max;,max (16)
Finally, the similarity measure between the two parameters U and V are done by dividing the
match score value by the maximum value. Based on that similarity measure value, Botnet would
be clustered.
4.5.2 Pseudo code for P-SIM clustering
Matched set is obtained by
M Matched parameter value
C Matching set
For i to 1 maximum of u and v
1,0 == jk
While
<< vjanduk
[ ] [ ]( )jvkuif =
Then add the botnet ][ku to M
Else if ( )1≥M add M to C
Empty M
End else
57
Increment k , Increment j
End while
If ( )1≥M add M to C
Empty M
ik = 0=j
While
<< vjanduk
If [ ] [ ]( )jvku =
Then add the botnet ][ku to M
Else if ( )1≥M add M to C
Empty M
End else
End While
By clustering the Botnet attacks based on the similarity value from the dataset from the training
phase, all kinds of attacks would be captured and destroyed to enhance the reliability of the
network in an IoT environment. The proposed classifier included a random Poisson forest that
counts the number of events and the time when these events occur in a given time interval. It
achieved a better prediction rate during the training phase. After being trained, a similar Botnet
would be clustered using Surflex-PSIM, which isolates the Botnet attacks as clusters based
automatic trained characteristics pocket value based on the Surflex characteristics of attacks.
58
FIGURE 4.6 Flow diagrams for clustering of Botnet
In this section, the proposed Isolating Botnet attacks using bootstrap aggregating surflex-PSIM
classifier. It clustered the Botnet attack based on the P-SIM clustering, which isolates the Botnet
attacks as clusters based automatic trained characteristics pocket value based on the surflex
characteristics of attacks. Also, wide dataset inputs are subjected to accurate clustering. The
timing of the individual Botnet removal analysis can be avoided so that accurate and time-
consuming Botnet detection can be achieved. It maintains the reliability and quality of service in
IoT applications.
List the devices with
their activities
Compute similarity matrix
Set device as a cluster
Number
cluster =1
Update a similarity matrix
Merge two similar devices
End
Yes
NoNo
Begin
59
4.6 Results and Discussion
The performance of the proposed system is evaluated based on the clustering ratio of different
types of Botnet attacks. Botnet attack means the group of attackers comes together with the aim
of the destruction of the whole network. Here two types of attacks by Botnet are considered.
They are Distributed Denial of Service (DDoS) attacks and spam attacks. DDoS is a digital
attack in which the culprit tries to make a machine or system asset inaccessible to its planned
clients by incidentally or inconclusively disturbing administrations of a host associated with the
Internet.
Email spam is the electronic form of garbage mail. It includes sending unwanted messages,
regularly spontaneous publicizing to countless. Spam is a genuine security worry as it can be
utilized to convey Trojan stallions, infections, worms, spyware and focused on phishing attacks.
The main difference is that in a general attack, one or two attackers would carry out a different
operation to disturb the normal flow of the network, but in a Botnet attack, groups of attackers
with the same intention would come together and carry out the same operation to destroy the
network’s reliability.
4.6.1 Implementation
The proposed system for IoT based network is implemented by python language
FIGURE 4.7 The IoT based network
4.6.2 Packet Delivery Ratio
The estimation of Packet Delivery Ratio (PDR) depends on the received and created
bundles (number of packets) as recorded in the trace document. All in all, PDR is characterized
as the proportion between the got bundles by the goal and the crea
FIGURE 4.8 Packet
4.6.3 Packet Loss
Packet loss happens when at least one packet
neglect to achieve their goal. Packet loss is estimated as a level of packets lost concerning
packets sent. The below figure depicted the packet lost ratio of
time and attack time.
60
The estimation of Packet Delivery Ratio (PDR) depends on the received and created
as recorded in the trace document. All in all, PDR is characterized
as the proportion between the got bundles by the goal and the created parcels by the source.
Packet delivery ratios during normal and attack period
appens when at least one packet of information traversing a computer network
neglect to achieve their goal. Packet loss is estimated as a level of packets lost concerning
packets sent. The below figure depicted the packet lost ratio of IoT based network during normal
The estimation of Packet Delivery Ratio (PDR) depends on the received and created a number of
as recorded in the trace document. All in all, PDR is characterized
ted parcels by the source.
during normal and attack period
of information traversing a computer network
neglect to achieve their goal. Packet loss is estimated as a level of packets lost concerning
based network during normal
FIGURE 4.9 Packet
4.6.4 Throughput
In data transmission, network throughput is the amount of data transferred successfully from
source node to destination node in a specified time
in megabits per second (Mbps) or gigabits per second (Gbps).
FIGURE 4.10 Throughput of the network under normal and attack period
61
Packet Loss of the network during normal flow and attack
In data transmission, network throughput is the amount of data transferred successfully from
node in a specified time and typically measured in bits per second
bps) or gigabits per second (Gbps).
hroughput of the network under normal and attack period
of the network during normal flow and attack
In data transmission, network throughput is the amount of data transferred successfully from
d in bits per second, as
hroughput of the network under normal and attack period
62
TABLE 4.1 Log detail of each smart object in the network
Node IP Address Arrival time
(Sec)
Packet Delivery
ratio
Packet
Loss
(Kbps)
Throughput
(Kbps)
n1 151.142.255.1 2.256 88.025 2.2835 56.895
n2 151.142.255.2 1.267 93.211 1.4756 54.742
n3 151.142.255.3 8.278 94.723 1.8629 55.315
n4 151.142.255.4 1.289 94.601 1.8687 56.889
n5 151.142.255.5 1.314 94.783 1.4756 53.895
n6 151.142.255.6 5.311 89.5404 1.8629 56.888
n7 151.142.255.7 4.322 93.031 2.1905 53.895
n8 151.142.255.8 3.333 95.5216 2.1905 51.2
n9 151.142.255.9 2.344 88.0122 1.4756 60.235
n10 151.142.255.10 1.355 90.5028 2.1905 53.895
n11 151.142.255.11 1.366 92.9934 1.8629 51.2
n12 151.142.255.12 9.377 95.484 2.1905 51.2
n13 151.142.255.13 8.388 87.9746 2.2835 51.2
n14 151.142.255.14 7.399 91.4652 1.9597 51.895
n15 151.142.255.15 6.441 92.9558 1.8629 50.96
n16 151.142.255.16 5.421 89.937 1.8629 53.895
n17 151.142.255.17 4.432 90.4276 1.57717 56.889
n18 151.142.255.18 3.443 92.9182 1.9598 56.888
n19 151.142.255.19 2.454 95.4088 1.8629 51.221
n20 151.142.255.20 1.465 89.8994 1.9598 56.38
4.6.5 Clustering of Botnet of Distributed Denial of Service (DDoS)
In this type of Botnet attack, a group of attackers would send the request for a resource to the
same destination address for a specified time continuously, so an authenticated user cannot get
that resource for a particular time. The proposed algorithm would cluster those nodes based on
the similarity value of packet sending time, a destination address and the resource they requested
continuously and the distance between source nodes and the destination node is calculated in
order to group the attacks efficiently.
63
In existing systems, hierarchical based clustering has been incorporated to cluster the devices of
the attackers in the IoT based network, the main problem with hierarchical based clustering is
that if the decision is taken once to join two clusters, it cannot be cancelled, but in this work, a
mixture model is used for clustering, so it has both matrices distance as well as similarity-based,
so the clustering ratio is high when compared with existing techniques.
TABLE 4.2 List of nodes clustered under DDoS Botnet attack
Node Source IP
Address
Packet Sending
Time(sec)
destination IP
Address
Resource
n1 151.142.255.1 0.214 151.142.250.11 file-1
n2 151.142.255.2 0.214 151.142.250.11 file-1
n3 151.142.255.3 0.214 151.142.250.11 file-1
n4 151.142.255.4 0.214 151.142.250.11 file-1
n5 151.142.255.5 0.214 151.142.250.11 file-1
n6 151.142.255.6 0.214 151.142.250.11 file-1
n7 151.142.255.7 0.214 151.142.250.11 file-1
n8 151.142.255.8 0.214 151.142.250.11 file-1
n9 151.142.255.9 0.214 151.142.250.11 file-1
n10 151.142.255.10 0.214 151.142.250.11 file-1
4.6.6 Clustering of Botnet of Spam attack
Here the Botnet would send the email to the spam box instead of sending to the inbox of the mail
application. It includes sending undesirable messages, regularly spontaneous publicizing to
countless. Spam is a genuine security worry as it can be utilized to convey Trojan stallions,
infections, worms, spyware and focused on phishing attacks. The proposed system would cluster
this type of Botnet based on the behavior that sending file to spam box instead of sending to the
inbox of the mail. Existing techniques did not cope with different sized cluster and irregular
shapes and need of breaking large clusters since they are based on hierarchical-based clustering.
In this work mixture-based clustering is incorporated so it manages all shapes of clustering.
64
FIGURE 4.11 Clustering of Botnet attack which leads to DDoS attack
TABLE 4.3 Lists of nodes clustered under Botnet Spam attack
Node Source IP Address Packet Sending Time
(Sec)
destination IP Address Resource
N11 151.142.255.11 0.214 151.142.255.1 mail
N12 151.142.255.12 0.214 151.142.255.1 mail
N13 151.142.255.13 0.214 151.142.255.1 mail
N14 151.142.255.14 0.214 151.142.255.1 mail
N15 151.142.255.15 0.214 151.142.255.1 mail
N16 151.142.255.16 0.214 151.142.255.1 mail
N17 151.142.255.17 0.214 151.142.255.1 mail
N18 151.142.255.18 0.214 151.142.255.1 mail
N19 151.142.255.19 0.214 151.142.255.1 mail
N20 151.142.255.20 0.214 151.142.255.1 mail
65
FIGURE 4.12 Clustering of Botnet attack which leads to Spam attack
4.6.7 Comparison of proposed system with existing techniques
In this section, the proposed system is compared with existing classifiers like decision tree,
Random forest, RBF. In order to evaluate the proposed system following parameters are
considered Precision, Recall, F-measure, and Accuracy.
TABLE 4.4 List of classifiers with proposed system
Classifiers Precision Recall F-Measure Accuracy
Decision Tree 0.968 0.931 0.949 96.5333
Random Forest 0.968 0.934 0.95 96.667
RBF 0.976 0.927 0.95 96.5333
Proposed 0.961 0.986 0.976 99.04
Precision
Precision is revealed in the fraction of the test part of the data as the attack is literally from the
attack categories.
FPTP
TPprecision
+=
Where TP represents the true positive value, FP indicates the false positive.
FIGURE
The proposed system has achieved
classifiers such as Decision tree, Random for
which counts the number of events and the time that these events occur in a given time interval.
Recall
Recall measures the fraction of attack class that was correctly detected
Where TP indicates the True Positive value and
66
FIGURE 4.13 Comparison graph for Precision
The proposed system has achieved an optimum precision value of 0.96 compared with other
classifiers such as Decision tree, Random forest, RBF since it has adapted Poisson
which counts the number of events and the time that these events occur in a given time interval.
Recall measures the fraction of attack class that was correctly detected
FNTP
TPcall
+=Re
ositive value and FN indicates the False Negative.
compared with other
oisson distribution,
which counts the number of events and the time that these events occur in a given time interval.
.
FIGURE 4.
The proposed system has achieved better
Decision tree, Random forest, and
Since the proposed system has used similarity
correctly.
F-Measure
F-measure can measure the test accuracy. It is a measurement of balance between precision and
recall.
Where P represents the precision and R denotes the r
67
FIGURE 4.14 Comparison graph for Recall
The proposed system has achieved better a recall value of 0.98, whereas other classifiers such as
and RBF have got the value of 0.93, 0.93, and 0.92
has used similarity-based clustering, it has separated each event
measure can measure the test accuracy. It is a measurement of balance between precision and
RP
RPmeasureF
+=−
**2
he precision and R denotes the recall value.
whereas other classifiers such as
RBF have got the value of 0.93, 0.93, and 0.92, respectively.
ng, it has separated each event
measure can measure the test accuracy. It is a measurement of balance between precision and
FIGURE 4.
Since the proposed system has adapted random Poisson distribution in
recorded all the rare events which are happened in the
experienced a better F- measure value of 0.97
random forest, and RBF have a value of 0.94
Accuracy
Accuracy is defined as the ratio of number of correctly classified
number of Botnet attacks
Where IcB indicates the correctly identified
attack.
68
FIGURE 4.15 Comparison graph for F-measure
Since the proposed system has adapted random Poisson distribution in the training phase, it has
recorded all the rare events which are happened in the IoT environment. Hence it has
measure value of 0.97, whereas other classifiers such as decision tree,
value of 0.94, 0.95 and 0.95, respectively.
Accuracy is defined as the ratio of number of correctly classified Botnet attacks to the total
TB
BIAccuracy c=
indicates the correctly identified Botnet attack, TB denotes the total n
training phase, it has
environment. Hence it has
reas other classifiers such as decision tree,
attacks to the total
denotes the total number of Botnet
FIGURE 4.1
Since the proposed system has employed
given time and Surflex-PSIM which isolates the
trained characteristics pocket value based on the S
experienced the better accuracy of 99.04
random forest, and RBF have got the value of 96.53%, 96.66%
are becoming a significant cyber
detect the presence of malicious bots and other anomalies in the ne
chapter 5.
69
FIGURE 4.16 Comparison graph for Accuracy
e proposed system has employed Poisson distribution which captures ra
PSIM which isolates the Botnet attacks as clusters based automatic
tics pocket value based on the Surflex characteristics of attacks,
experienced the better accuracy of 99.04%. In contrast, other classifiers such as decision tree,
ot the value of 96.53%, 96.66%, and 96.53% respectively
re becoming a significant cybersecurity threat for IoT applications. It is, therefore
detect the presence of malicious bots and other anomalies in the network, which has proposed
distribution which captures rare events for a
attacks as clusters based automatic
urflex characteristics of attacks, it has
other classifiers such as decision tree,
respectively. Bots
therefore, essential to
which has proposed in
70
CHAPTER 5
A Novel Forecastive Anomaly Based Botnet Revelation
Framework for Competing Concerns in Internet of
Things: Third Approach
5.1 Forecastive Anomaly-based Botnet Revelation Framework
IoT combines many low-cost heterogeneous devices that can generate large volumes of private
information with less or no security, which leads to security issues. This unwrapped lesion in IoT
security gives rise to an attacker to develop a network of bots to infect the devices with malicious
applications called Botnet. Botnet provides a distributed platform for a number of prohibited
activities including Distributed Denial of Service (DDoS) attacks against crucial targets,
phishing, malware dissemination and click fraud, etc. Preceding methodologies utilized various
Botnet detection grouped under behavior-based detection systems and user data-based detection
system to solve these security problems. Furthermore, machine learning algorithms are also in
high demand to face the issues caused by Botnet even though they fail to predict the anomalies
based on their behavior and results with poor accuracy in detecting Botnet, etc.
FIGURE 5.1 Proposed Frameworks
71
Hence to deal with this Botnet and the hazardous anomalies that are highly vulnerable with the
existing approaches, a novel forecastive anomaly-based Botnet revelation framework is designed
in our proposed work is shown in Figure 5.1. The approach works as a two-way progression, i.e.,
first is the instance creation, and the second is cataloging. As an alternative to machine learning
algorithms, ensemble-based stream mining is being used to generate several instances with less
memory and time in our work. Once the instances are created, Graph Structure Based Detection
of Anomaly (GSBDA) is initiated based on features derived by the stream mining algorithm to
detect the existence of hazardous anomalies. In addition, the second phase deploys a KNN (K
Nearest Neighbor) algorithm, a type of instance-based learning algorithm. It is used to identify
the Botnet accurately by observing the network flows. Thus, the poor security practices are
addressed, and issues caused by Botnet are detected with our proposed framework.
5.2 Ensemble-based Stream Mining
In novel ensemble-based Stream mining, the data that are relevant to certain anomalies are
collected from the organizations for the past several years and are then characterized as an
unbounded data stream. These unbounded data streams are then partitioned into several numbers
of large pieces called instances.
FIGURE 5.2 Ensemble-based stream mining-concepts drift in the unbounded data stream
72
Figure 5.2 shows how the judgment boundary for a classifier varies when such current
experiences concept drift. While considering Figure 5.2, the circles in the unbounded data stream
represent the data point. The unfilled circles denote True Negatives (TN) (i.e., non-anomalies)
and the solid circles represent True Positives (TP) (i.e., anomalies), respectively. Here the dashed
line indicates the old decision boundary and the dark solid line indicates the new decision
boundary for those chunks, respectively.
Shaded circles represent a new-fangled notion, which has drifted comparative to the prior chunk.
Thus to perform categorization, the decision boundary ought to be accustomed to account for the
new-fangled notion.
Let us consider the probable assortment of misinterpretation (false detection):
1) The judgment boundary of chunk two is marginally shifted in comparison to chunk one.
Thus, an inaccurate definition of many non-anomalous data was labeled anomalous, resulting in
FP (False Positive) score.
2) The judgment boundary of chunk three is marginally shifted compared to chunk two.
Therefore, an inaccurate definition of many non-anomalous data was categorized as anomalous,
resulting in FN (False Negative) score.
In general, the intersection caused between the old and the new decision boundaries for the same
chunk would increase the FN and the FP counts. Hence to perform classification, an ensemble-
based stream mining concept is proposed, which classifies all the data instances in the stream.
The ensemble classification procedure is illustrated in Figure 5.3. Here C1, C2, C3, and C4 are
running GSBDA. The static controlled GSBDA is originally used to train models from one-
person models. The normative substructures are defined in the chunk, and comparisons are made
between models in the ensemble. The growing model classifies the test substructure dependent
on the model's difference between the measure and the model's normative substructure. Once all
models cast their votes, a weighted majority vote on the rating is introduced to make a final
decision.
Ensemble development is maintained so that a set of K models is maintained correctly at all
times. As each new component arrives, a K + 1 model is created from the new component, and
the hunting model of these K +1 model is discarded. Those who leave can be selected in many
ways. One approach calculates the estimated prediction error of each K + 1 model in the recent
chunk to find the poorest attendee. Recent truths should be readily available so that predictive
error can be accurately measured.
73
5.3 Ensemble Classification approach
The classification approach makes use of a classifier and the procedure for ensemble
Figure 5.3 Ensemble-based classifier designs
If ground truth is not available, we will instead rely on majority voting; the model with the
minimum contract is left to the majority decision. It is a combination of the K model, which best
fits the current concept.
Once when the instances are created, Graph Structure-Based Detection of Anomaly (GSBDA) is
initiated based on features derived by the stream mining algorithm in order to detect the presence
of hazardous anomalies.
Algorithm:
For each model, KϵE do
Test K on Ln and compute its predictable error
End for
Kn Newly trained GSBDA classifier
Test Kn on Ln and calculate its predictable error
E best K classifiers from K
With this algorithm, a new model from the most recent chunk is identified and has been added
temporarily into the ensemble line
possible related anomalies. Finally, the ensemble is updated by dumping the model with the most
disagreements from the weighted mass opinion. The model with an arbitrary poor performance is
then discarded in case of multiple models having high disagreements.
Weighted majority opinions are calculated using the formula
Where,
The team includes a model trained with Chunk i,
Reported anomalies
The most recent is Chunk’s index is l
The weighted average WA (EN
majority vote.
Consider a fair example for GSBDA, which is shown in F
obtained after iteration in GSBDA is shown in F
74
Algorithm: Ensemble Classifier
// E – Current Ensemble
and compute its predictable error // Ln – Most recently labeled data
//chunks
Newly trained GSBDA classifier from data Ln // This is newly Trained Model
and calculate its predictable error // Testing This Model for Error
best K classifiers from Kn depending on the predictable error // Select the model with
// less Predictable Error
With this algorithm, a new model from the most recent chunk is identified and has been added
line. It is then followed by testing the graph t to check for the
possible related anomalies. Finally, the ensemble is updated by dumping the model with the most
disagreements from the weighted mass opinion. The model with an arbitrary poor performance is
tiple models having high disagreements.
Weighted majority opinions are calculated using the formula
{ }
{ }∑
∑
∈
−
∈∈
−
=
EMi
il
AaEMi
il
MaEWA
|
,|),(
λ
λ
(1)
The team includes a model trained with Chunk i,
is a constant fading factor
The most recent is Chunk’s index is l.
WA (EN, a) is then rounded to integer (0 or 1) to obtain the weighted
e for GSBDA, which is shown in Figure 5.6 a. The best substructure
iteration in GSBDA is shown in Figure 5.6 b.
Current Ensemble
Most recently labeled data
is newly Trained Model
// Testing This Model for Error
/ Select the model with
With this algorithm, a new model from the most recent chunk is identified and has been added
testing the graph t to check for the
possible related anomalies. Finally, the ensemble is updated by dumping the model with the most
disagreements from the weighted mass opinion. The model with an arbitrary poor performance is
is then rounded to integer (0 or 1) to obtain the weighted
he best substructure
75
FIGURE 5.4 a) Typical fair GSBDA example, b) Best substructure and c) Anomalous
Substructure
Then, on the second iteration, this substructure is compressed to a single vertex, extensions are
estimated, and the resulting anomalous substructure is shown in Figure 5.5 c. Once more, the
edge and vertex is labeled as the real abnormality, yet the whole irregular substructure is output
for conceivable investigation.
This method uses GSBDA's past iteration findings to identify irregularities in the current chunks
of data. That is, in each example, the normative substructures found in previous GSBDA
iterations that continue. This requires the model to take all data into account until the model
produces an ensemble that is not similar to the current chunk.
Poorly performing outdated models are being replaced by higher-performing, younger versions
more suited to the current concept. While the cumulative amount of data in the system is
technically unbounded, this makes tractable every round of classification.
5.4 Cataloging
Our approach determines the similarity in behavior of hosts using its varied properties such as
netflows of information during a predefined time window and attempts to detect bots by
correlating these comparable behaviors between distinct time windows.
76
This work makes use of a KNN clustering algorithm to identify the bot based on the netflows.
Netflow generating components generate TCP netflows between hosts. At that point, netflow
Clustering and Alert Clustering components group non-filtered netflows and alerts. At long last
toward the finish of each time window connection relates the created alert clusters and netflow
clusters so as to distinguish the bot contaminated hosts. When instances are developed, Graph
Structure-Based Detection of Anomaly (GSBDA) is implemented on the basis of features
derived from the stream mining algorithm to detect the presence of hazardous anomalies. In
addition, the second phase uses the KNN (K Nearest Neighbor) algorithm [100], a form of the
instance-based learning algorithm. It is used to specifically classify the Botnet by analyzing the
network flows.
KNN Algorithm Pseudocode:
Let (Xi, Ci) where i = 1, 2……., n be data points. Xi denotes feature values & Ci denotes labels
for Xi for each i.
Assuming the number of classes as ‘C’, Ci ∈ {1, 2, 3…, C} for all values of i.
Let x be a point for which label is not known, and we would like to find class using KNN
algorithm.
1. Calculate “d(x, xi)” i =1, 2,.., n where d denotes the Euclidean distance between the
points.
2. Arrange the calculated n Euclidean distances in non-decreasing order.
3. Let k be a +ve integer, take the first k distances from this sorted list.
4. Find those k-points corresponding to these k-distances.
5. Let ki denotes the number of points belonging to the ith
class among k points i.e. k ≥ 0
6. If ki >kj ∀ i ≠ j then put x in class i.
As a result, poor security practices are discussed and issues related to Botnet are defined with
this system. Results and Implementation part of the above sections are described below.
5.5 Results and Discussion
In the proposed work, CTU 13 dataset is used [78]. It features scenery based on Botnet traffic
restricted to the Czech Technical University. It features different scenarios with different types of
77
cyber attacks with Botnet. All such scenarios are recorded individually as a separate file. Each
file has 14 attributes and a label. Typically, a dataset consists of about 13 different scenarios.
The CTU-13 dataset contains Botnet, normal, and background traffic and contains 13 scenarios
shown in the Table 5.1 where each scenario is created with different malware. Normal traffic
was created by regular users by internet surfing, mail checking, and surfing social media sites.
Background traffic is generated to show the presence of Botnet traffic. These all scenarios were
captured in 'pcap' files.
TABLE 5.1 CTU-13 scenarios [78]
This dataset contains total of 15 columns namely ‘StartTime’(Start time of the attack),
‘Dur’(Duration of the attack in second), ‘Proto’(Protocols e.g. TCP,UDP,ICMP etc),
‘SrcAddr’(Source IP address), ‘Sport’(Source port number), ‘Dir’(Direction the traffic),
‘DstAddr’(Destination IP address), ‘Dport’(Destination port number), ‘State’(State of the
transaction according to the protocol), ‘sTos’(Source type of service filed), ‘dTos’(Destination
type of service field), ‘TotPkts’(Total transaction packet count), ‘TotBytes’(Total transaction
bytes), ‘SrcBytes’ (Total transaction bytes from source to destination), and ‘Label’(Three target
values namely background, Botnet and normal). The direction column defines TCP connection
source and the symbol at the center represents transaction state. The symbol ‘-’ means the
transaction was normal, ‘|’ means the transaction was RESET, ‘o’ means the transaction timed
out and ‘?’ means that the transaction direction was unknown. Table 5.2 shows dataset
distribution for background flows, Botnet follows and normal flows.
78
TABLE 5.2 CTU–13 Dataset Distributions
Scenario Background Flows
(%)
Botnet Flows (%) Normal Flows (%)
1 95.40 0.89 3.69
2 95.59 0.85 3.54
3 94.60 0.49 4.89
4 91.91 0.15 7.93
5 91.37 0.46 8.15
6 94.12 0.22 5.64
7 93.71 0.06 6.22
8 95.47 0.10 4.42
9 90.22 5.02 4.75
10 87.54 6.24 6.21
11 29.33 67.97 2.69
12 29.33 67.97 2.69
13 93.76 1.67 4.55
As shown in Table 5.2, CTU-13 dataset, scenario 1 has 95.40% background traffic, 0.89 %
Botnet traffic and 3.69% normal traffic. Same we can observe for other scenarios as well. The
greater amount of imbalance in apparent in the traffic present in this dataset.
Furthermore, the visuals of the CTU-13 dataset are used in the same way as suggested by its
author [78]. Initially, in our proposed framework, the work begins with initializing the number of
nodes in IoT, which is shown in Figure 5.5 (a). Here we are initializing 30 nodes, and the
creations of nodes are shown in Figure 5.5 (b).
Once the nodes get created, then immediately using ensemble-based stream mining, several
numbers of instances are generated based on the behavior of network packets. Hence the time,
as well as memory complexity, will be reduced with high prediction accuracy. Here the
anomalies are detected using a GSBDA with the information obtained from Stream mining.
Moreover, the Botnet has been detected based on the KNN algorithm in the cataloging phase,
which detects the Botnet based on the netflow. The detected nodes are shown in Figure 5.6.
79
(a) (b)
FIGURE 5.5 a) Node Initialization and b) Node Creation
FIGURE 5.6 Nodes that are detected as Botnet
5.5.1 Performance Analysis
Here in table 5.1, the performance of our proposed work is discussed with various metrics such
as TPR (True Positive Rate), FPR (False Positive Rate), precision, accuracy, error rate, and F-
measure. Our proposed framework achieves better results, say, 0.97 TPR, 0.19 FPR, 0.80
precision, 0.98 accuracy, 0.5 error rates and 0.87 F-measure.
80
TABLE 5.3 Performance metric of the proposed framework
Performance
Metrics
TPR FPR Precision Accuracy Error
Rate
F-measure
Proposed 0.97 0.19 0.80 0.98 0.5 0.87
TABLE 5.4 Throughput, packet loss, and packet delivery ratio, and arrival time of the five nodes
Node Arrival
time (sec)
Packet
Delivery
ratio
Packet
Loss
(Kbps)
Throughput
(Kbps)
Node1 2.256 98.025 2.2835 56.895
Node2 1.267 94.211 1.4756 54.742
Node3 8.278 94.723 1.8629 55.315
Node4 1.289 94.601 1.8687 56.889
Node5 1.314 94.783 1.4756 53.895
Table 5.4 describes packet arrival time, packet delivery ratio, packet loss, and departure. For
Node 1, the arrival time is 2.256, the packet delivery ratio is 98.025, the packet loss is 2.2835
and the output is 56.895. For Node 2, the arrival time was 1.267, the packet delivery ratio was
94.211, the packet loss was 1.4756, and the output was 54.742. For Node 3, the arrival time was
8.278, the packet delivery ratio was 94.723, the packet loss was 1.8629, and the throughput was
55.315. For Node 4, the arrival time was 1.289, the packet delivery ratio was 94.601, the packet
loss was 1.8687, and the throughput was 55.3889. For Node 5, the arrival time was 1.314, the
packet delivery ratio was 94.783, the packet loss was 1.4756 and the throughput was 55.895.
FIGURE 5.7 Arrival time, packet delivery, packet loss, and throughput
TABLE 5.5 Total no. of nodes to search for Botnet and other anomalies
Dataset No. of
bots
No. of
identified
bots
CTU13 5
Table 5.3 describes the Total number of nodes that
anomalies. Here in our proposed work, the total nu
is 5. Accordingly, about 30 nodes
81
rrival time, packet delivery, packet loss, and throughput of five nodes
Total no. of nodes to search for Botnet and other anomalies
No. of
identified
bots
No. of
other
anomalies
detected
Size of Bot
cluster
No. of nodes in
search of Bots &
anomalies
5 10 30
Total number of nodes that are subjected to attack under Botnet and other
anomalies. Here in our proposed work, the total number of anomalies detected is 10
nodes have been used to search for Bot as well as anomalies
of five nodes
Total no. of nodes to search for Botnet and other anomalies
No. of nodes in
search of Bots &
anomalies
30
are subjected to attack under Botnet and other
mber of anomalies detected is 10, and Botnet
to search for Bot as well as anomalies.
82
5.5.2 Performance Comparison
The quantitative results are presented in Table 5.4. The proposed algorithm for net flow analysis
has been compared with other methods such as Bclus, CCD, and Spark-ELM. Tables for
performance metrics 5.6–5.10, such as accuracy, precision, and f-measurement are recorded in
datasets recorded in five attempts, including Botnet operations.
TABLE 5.6 Comparison with prior methodologies for Scenario 1
Method TPR FPR Precision Accuracy Error Rate F-measure
Spark-
ELM
0.91 0.14 0.68 0.87 0.13 0.77
CCD 1.0 0.05 0.86 0.96 0.03 0.92
Bclus 0.4 0.4 0.5 0.5 0.4 0.48
Proposed 0.81 0.01 0.79 0.96 0.05 0.86
TABLE 5.7 Comparison with prior methodologies for Scenario 2
Method TPR FPR Precision Accuracy Error
Rate
F-measure
Spark-
ELM
0.95 0.05 0.88 0.95 0.05 0.92
CCD 0.74 0.02 0.96 0.88 0.11 0.92
Bclus 0.3 0.2 0.6 0.5 0.4 0.41
Proposed 1.0 0.04 0.95 0.98 0.04 0.96
Scenario 1 confers in Table 5.6 comprises IRC-based Botnet that sends spams, whereas scenario
2 in Table 5.7 consists of bots but differs with the number of net flows. The proposed approach
achieves better results in terms of accuracy and error rates for the same bot in the second
scenario.
83
TABLE 5.8 Comparison with prior methodologies for Scenario 6
Method TPR FPR Precision Accuracy Error
Rate
F-measure
Spark-
ELM
0.89 0.02 0.92 0.86 0.02 0.96
CCD 0.0 0.0 0.0 0.64 0.35 0.0
Bclus 0.0 0.0 0.4 0.4 0.5 0.04
Proposed 0.94 0.00 0.95 0.94 0.06 0.98
In Scenario 6, Table 5.8, the Botnet scans the SMPT mail servers for several hours and connects
to several remote desktop services. While comparing the existing methods with our proposed
work, our proposed work effectively detects the malware and thereby achieves better
performance than the prior works.
TABLE 5.9 Comparison with prior methodologies for Scenario 8
Method TPR FPR Precision Accuracy Error
Rate
F-measure
Spark-
ELM
0.26 0.09 0.47 0.76 0.24 0.33
CCD 0.0 0.0 - 0.64 0.35 0.0
Bclus 0.0 0.04 0.0 0. 66 0.33 -
Proposed 0.28 0.1 0.52 0.89 0.23 0.35
In Scenario 8, Table 5.9, the Botnet communicates with various C&C hosts and receives
encrypted data. So if data is encrypted, then it is difficult to make the decision regarding
malicious data. All methods give a low value of TPR. In this case, the malware used only certain
and very specific communication channels to communicate with the C&C server, which were not
reflected in the training data. However, it gives a higher recognition rate than other methods
proposed.
84
TABLE 5.10 Comparison with prior methodologies for Scenario 9
Method TPR FPR Precision Accuracy Error
Rate
F-
measure
Spark-ELM 0.94 0.12 0.89 0.93 0.06 0.94
CCD 0.38 0.04 0.93 0.59 0.4 0.54
Bclus 0.1 0.2 0.4 0.4 0.5 0.25
Proposed 0.92 0.03 0.9 0.96 0.06 0.97
In Scenario 9 in Table 5.10, some hosts are infected with the Neris malware, which actively
starts sending spam emails. But when comparing the existing methods with our proposed work,
our proposed work achieves better performance than the prior works.
Figure 5.8 reveals the comparison of proposed methods with supervised and unsupervised
learning methods. It describes the FP, TN, FN, TP, Accuracy, false-positive rate, and false-
negative rate of supervised, unsupervised and proposed methods .This comparison result exhibits
a better performance result for the proposed method than the existing works.
FIGURE 5.8 Comparison of proposed with Supervised and unsupervised learning methods
While comparing the prior methodologies with our proposed work, the proposed framework
exhibits better results in terms of prediction/detection accuracy.
Thus, this work successfully detects the anomalies with high prediction accuracy.
0
0.2
0.4
0.6
0.8
1
1.2
Supervised
Unsupervised
Proposed
85
CHAPTER 6
Conclusion, Future Scope and References
6.1 Conclusion
The cyber-world is continually developing, and new technology stacks are being proposed on a
regular basis by researchers. At the same time the Internet of Things (IoT) has promoted
significant changes in our daily life in many aspects such as smart home, smart city, connected
health, intelligent supply chain, smart farming, etc.
A context-aware application is still required in IoT, which would sense the physical environment
from the security point of view and protect the devices accordingly. Providing comprehensive
information security is challenging and an integral part of the IoT-based system.
IoT consists of many heterogeneous and low-cost devices with little or no security embedded
into them, which generate a huge amount of private information, and may create many security
problems. This unwraps lesion in IoT security gives rise to an attacker to develop the network of
bots to infect the devices with malicious applications called Botnet. Botnet supplies a distributed
platform for prohibited activities like initiating Distributed Denial of Service (DDoS) attacks
against crucial targets, phishing, and malware dissemination, click fraud, etc.
Nodes of the IoT are limited in resources where dedicated, and diversified communication
protocols are used. Some of these differences weaken the ability of the IoT nodes to protect
themselves. Day-by-day new technologies are being developed, followed by a continual
development in the cyber world. At the same time the IoT has endorsed great changes in our
daily life in numerous aspects, such as health care and traffic monitoring services. Moreover, it
aids the machine to machine communication by connecting multiple devices over the internet.
Conversely, there is a rise in intrinsic vulnerabilities that are often leveraged by cybercriminals.
Yet, the number of active users in IoT gets increased day by day. One of the major security
concerns in IoT is Botnet, a pervasive and hazardous thread. Several thousand to millions of
compromised computers (bots) in a network are used by malicious attackers to perform various
illicit and vulnerable activities. In order to deal with these security issues, prior methodologies
make use of different Botnet detection techniques broadly classified into behavior-based
86
detection systems and user data-based detection systems. Furthermore, machine learning
algorithms are also is in high demand to face the problems caused by Botnet.
In this context, several proposed classifiers have been successfully utilized here on real test bed,
with achieving higher prediction rate. Clustering the same kind of Botnet from the trained data
set using multiple algorithms enables the mass removal of Botnet.
• The proposed method is evaluated with the experimental setup with real IoT
nodes.
• The proposed systems achieve more reliability of the IoT-based network by
removing Distributed Denial of Service (DDoS) and spam Botnet.
• The results obtained for the first proposed system have exposed better
performance when compared to the existing systems.
• Thus, the first proposed mass removal Botnet attack using heterogeneous
ensemble stacking PROSIMA Classifier in IoT has clustered each type of Botnet
attack such distributed denial of service, spam Botnet attack, and maintaining the
reliability and quality of service in IoT applications. This approach achieved high
detection accuracy value of 98.63%.
In search of a simplified algorithm along with higher prediction accuracy, another classifier is
proposed. The primary significance of this approach is to search for the best feature among an
irregular subset of features. This procedure has achieved a unique approach in Botnet detection.
• It has obtained the optimal precision value of 0.961 and a recall value of 0.986. It
accomplished a high F-measure value of 0.976 and high detection accuracy value
of 99.04%.
• Thus, the proposed isolating Botnet attacks using bootstrap aggregating surflex-
PSIM classifier in the IoT has clustered each type of Botnet attack such
distributed denial of service, spam Botnet attack, and maintaining the reliability
and quality of service in IoT applications.
With the internet, billions of devices in the Internet of Things (IoT) are interconnected and
communicated with each other through messaging bots. The messaging bots are sometimes
87
controlled by the attackers to carry out several malicious activities. Thus, bots become a serious
cybersecurity hazard for IoT devices. For this reason, it is crucial to detect the existence of
malicious bots and other anomalies in the network. Thus, to tackle these bots and anomalies, the
third approach is proposed.
• In the third proposed approach, a novel forecastive anomaly-based Botnet
revelation framework, an ensemble-based stream mining is used to generate
several numbers of instances.
• Once the instances are created, GSBDA is employed here to detect the presence
of hazardous anomalies.
• Finally, in the cataloging phase, with the help of the KNN clustering algorithm,
the Botnet is accurately detected from the anomalies with the use of ensemble-
based stream mining.
• It is concluded that our proposed frameworks effectively detect the anomalies,
including Botnet, more accurately with reduced time complexity.
6.2 Future Scope
IoT is still in its growing phase, with various security models and structures recently proposed to
address its security challenges and privacy issues. The Botnet can be identified by observing the
behavior of the bots on the network-tracked traffic by monitoring the traffic flow of the system.
Botnet's behavior can be analyzed using classification techniques, which help find the
characteristics that distinguish Botnet traffic from benign traffic. Some methods focused on
detecting the Botnet present in the network using machine learning techniques to discern the
patterns shown by the Botnet in the system.
For potential context, other types of data collected by honeypot, such as malicious binaries,
attack replays, etc., may be considered when researching and detecting Botnet. In the future, this
strategy needs to be extended to the next stage, where open problems or concerns can be seen by
applying them in real-time scenarios. As per News from security intelligence, a new Botnet has
spiked among the Internet of Things (IoT), the Mozi Botnet.
88
This Botnet is active since 2019. Mozi Botnet has been accounted for around 90% of IoT traffic
in just one year. Its code overlaps with Mirai and other Botnet variants. Now there is a vast scope
of research to detect and remove Mozi Botnet. The main target of Bot-master is a huge audience
and keeping themselves hidden from these huge audiences. As we know, now a day’s very huge
audiences are on social media like Facebook, Twitter, and whatnot. In these mediums, people’s
trust levels are pretty high, so that they trust what others are sending.
Numbers of services are provided by social media sites like banking, friend lists, gaming, etc., if
they can compromise these, which led to a very sophisticated fraud scheme. In this era of mobile
communication, another severe threat is possible on smartphones called mobile Botnet. It can
access to mobile phones and send control to the bot-master to handle such devices remotely to
generate very large-scale attack. Another most recently created Botnet that highly affects the IoT
is Torii Botnet. Torii is the most sophisticated Botnet observed by Avast. It is stealing the IoT
device’s information and allows the attackers to execute code remotely, but this Botnet can
perform other commands with multiple layers of encryption.
This Botnet communicates with the C & C server, and the coder of this Botnet executes and
delivers the payload to compromised devices. Botnet attacks are restricted only to IoT devices,
social networks, and mobile devices, but Botnet can control cloud services as well. With the
growing usage of the Internet for mobile phones, social media sites, and cloud computing, these
threats will set their target in these fields, In this field research is at the initial stage.
Attackers will find loopholes to compromise these fields. Many open-source Botnets are
available. Also, it is straightforward to install like any other open-source software with just basic
knowledge of source code, and anyone can compile this code. Botnet construction kits are
available, so it becomes easy to set up Botnet compared to open source Botnet code, and these
software kits are GUI-based and very user-friendly. One such example is the ZeuS Botnet kit. No
technical skills are needed to generate Botnet attacks.
These kits so specialized and sophisticated to generate 0-day attacks. It may also provide
technical support with these kits and software updates to make malware up-to-date. So there is a
huge scope of research in Botnet detection. Collaborative work is required between governments,
industries, and academics to detect and mitigate of such threats.
89
6.3 References
[1] Feily, M., Shahrestani, A. and Ramadass, S., 2009, June. A survey of Botnet and Botnet
detection. In the 2009 Third International Conference on Emerging Security Information,
Systems and Technologies IEEE pp. 268-273.
[2] Suo, H., Wan, J., Zou, C. and Liu, J., 2012, March. Security in the Internet of Things: a
review. In the 2012 international conference on computer science and electronics
engineering, Vol. 3, IEEE pp. 648-651.
[3] Madakam, S., Ramaswamy, R. and Tripathi, S., 2015. Internet of Things (IoT): A literature
review. Journal of Computer and Communications, 3(05), pp.164.
[4] Perwej, Y., Parwej, F., Hassan, and Akhtar, N., 2019. The Internet-of-Things (IoT) Security:
A Technological perspective and review. International Journal of Scientific Research in
Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN, pp.2456-
3307.
[5] Weber, R.H., 2010. Internet of Things–New security and privacy challenges. Computer law
& security review, 26(1), pp.23-30.
[6] Suo, H., Wan, J., Zou, C. and Liu, J., 2012, March. Security in the internet of things: a
review. In the 2012 international conference on computer science and electronics
engineering Vol. 3, IEEE pp. 648-651.
[7] Whitmore, A., Agarwal, A. and Da Xu, L., 2015. The Internet of Things—A survey of
topics and trends. Information Systems Frontiers, 17(2), pp.261-274.
[8] Ziegeldorf, J.H., Morchon, O.G. and Wehrle, K., 2014. Privacy in the Internet of Things:
threats and challenges. Security and Communication Networks, 7(12), pp.2728-2742.
[9] Liu, J., Xiao, Y. and Chen, C.P., 2012, June. Authentication and access control in the
internet of things. In the 2012 32nd International Conference on Distributed Computing
Systems Workshops (pp. 588-592). IEEE.
[10] E. Bertino and N. Islam,2017, Botnets and Internet of Things security, Computer, 50(2), pp.
76-79.
[11] Angrishi, K., 2017. Turning the internet of things (IoT) into an internet of vulnerabilities
(IoV): IoT Botnet. arXiv preprint arXiv:1702.03681.
90
[12] Kolias, C., Kambourakis, G., Stavrou, A. and Voas, J., 2017. DDoS in the IoT: Mirai and
other Botnets. Computer, 50(7), pp.80-84.
[13] Zarpelao, B.B., Miani, R.S., Kawakani, C.T. and de Alvarenga, S.C., 2017. A survey of
intrusion detection in the Internet of Things. Journal of Network and Computer
Applications, 84, pp.25-37.
[14] Lindqvist, U. and Neumann, P.G., 2017. The future of the Internet of
Things. Communications of the ACM, 60(2), pp.26-30.
[15] Yang, Y., Wu, L., Yin, G., Li, L. and Zhao, H., 2017. A survey on security and privacy
issues in Internet-of-Things. IEEE Internet of Things Journal, 4(5), pp.1250-1258.
[16] Q. Yaseen, M. Aldwairi, Y. Jararweh, M.Al-Ayyoub, B. Gupta, Collusion attacks
mitigation in internet of things: a fog based model, Multimedia Tools and Applications,
pp.1-20, 2017.
[17] Y. Yilmaz, S. Uludag, Mitigating IoT-based cyber attacks on the smart grid, 16th IEEE
International Conference on Machine Learning and Applications (ICMLA), pp. 517-522,
2017.
[18] A. Azab, M. Alazab and M. Aiash, 2016. "Machine learning based Botnet identification
Traffic," IEEE Trustcom/BigDataSE/ISPA, pp. 1788-1794,
[19] Tuan, T.A., Long, H.V., Kumar, R., Priyadarshini, I. and Son, N.T.K., 2019. Performance
evaluation of Botnet DDoS attack detection using machine learning. Evolutionary
Intelligence, pp.1-12.
[20] Vishwakarma, R. and Jain, A.K., 2019, April. A Honeypot with machine learning based
detection framework for defending IoT based Botnet DDoS attacks. In 2019 3rd
International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1019-1024.
IEEE.
[21] Ziegeldorf, J.H., Morchon, O.G. and Wehrle, K., 2014. Privacy in the Internet of Things:
threats and challenges. Security and Communication Networks, 7(12), pp.2728-2742.
[22] D. H. Summerville, K. M. Zach and Y. Chen, 2015. Ultra-lightweight deep packet anomaly
detection for Internet of Things devices, IEEE 34th International Performance Computing
and Communications Conference (IPCCC), pp. 1-8
91
[23] Q. Yan, W. Huang, X. Luo, Q. Gong, and F.R. Yu, A multi-level DDoS mitigation
framework for the industrial Internet of things, IEEE Communications Magazine. 56(2)
(2018) 30-36.
[24] M. Yeo, Y. Koo, Y. Yoon, T. Hwang, J. Ryu, J. Song, and C. Park, Flow-based malware
detection using convolutional neural network, In Information Networking (ICOIN), 2018
International Conference on. (2018) 910-913.
[25] S.W. Park, J. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.J. Yoo, An energy-efficient
and scalable deep learning/inference processor with tetra-parallel MIMD architecture for big
data applications, IEEE transactions on biomedical circuits and systems. 9(6) (2015) 838-
848.
[26] J.A. Jerkins, Motivating a market or regulatory solution to IoT insecurity with the Mirai
Botnet code, In Computing and Communication Workshop and Conference (CCWC), 2017
IEEE 7th Annual. (2017) pp 1-5.
[27] A.O. Prokofiev, Y.S. Smirnova, and V.A. Surov, A method to detect Internet of Things
Botnet. In Young Researchers in Electrical and Electronic Engineering (EIConRus), 2018
IEEE Conference of Russian. (2018) 105-108.
[28] J. Smith-perrone, and J. Sims, Securing cloud, SDN and large data network environments
from emerging DDoS attacks. In Cloud Computing, Data Science & Engineering-
Confluence, 2017 7th International Conference on. (2017) 466-469.
[29] Giachoudis, N., Damiris, G.P., Theodoridis, G. and Spathoulas, G., 2019, Collaborative
agent-based detection of DDoS IoT Botnet. In 2019 15th International Conference on
Distributed Computing in Sensor Systems (DCOSS) (pp. 205-211). IEEE.
[30] Shafi, Q. and Basit, A., 2019, January. DDoS Botnet prevention using blockchain in
software defined Internet of Things. In 2019 16th International Bhurban Conference on
Applied Sciences and Technology (IBCAST) (pp. 624-628). IEEE.
[31] Ahmed, Z., Danish, S.M., Qureshi, H.K. and Lestas, M., 2019, September. Protecting IoTs
from mirai Botnet attacks using blockchains. In 2019 IEEE 24th International Workshop on
Computer Aided Modeling and Design of Communication Links and Networks
(CAMAD) pp. 1-6. IEEE.
92
[32] “Mirai (malware).” Wikipedia, Wikimedia Foundation, 19 Feb. 2019,
https://en.wikipedia.org/wiki/Mirai_(malware).
[33] Q. Yan, W. Huang, X. Luo, F. Richard Yu, A multi-level DDoS mitigation framework for
the industrial Internet of things, IEEE Communications Magazine, Vol.56, No.2, pp.30-36,
2018.
[34] M. Yeo, Y. Koo, Y. Yoon, T. Hwang, J. Ryu, J. Song, C. Park, Flow-based malware
detection using convolution neural network, IEEE Information Networking (ICOIN), pp.
910-913, 2018.
[35] S. Wook Park, J. Park, K. Bong, D. Shin, J. Lee, S. Choi, H.J Yoo, An energy-efficient and
scalable deep learning/inference processor with tetra-parallel MIMD architecture for big
data applications, IEEE transactions on biomedical circuits and systems, Vol. 9, No.6,
pp.838-848, 2015.
[36] J.A Jerkins, Motivating a market or regulatory solution to IoT insecurity with the Mirai
Botnet code,IEEE Computing and Communication Workshop and Conference (CCWC) ,
pp.1-5, 2017.
[37] C. Kolias, G. Kambourakis, A. Stavrou, J. Voas, DDoS in the IoT: Mirai and other Botnet,
IEEE Computer, Vol. 50, No.7, pp.80-84, 2017.
[38] G. Perrone, M. Vecchio, P,R.Pecori, The Day After Mirai: A survey on MQTT security
solutions after the largest cyber-attack carried out through an army of IoT devices, Second
International Conference on Internet of Things, Big Data and Security PP.246-253, 2017.
[39] J. Smith-perrone, J. Sims, Securing cloud, SDN and large data network environments from
emerging DDoS attacks, 7th
IEEE International Conference on Cloud Computing, Data
Science & Engineering-Confluence, pp. 466-469, 2017.
[40] A. Stanciu,T.C Balan, C. Gerigan,S. Zamfir, Securing the IoT gateway based on the
hardware implementation of a multi pattern search algorithm, IEEE Optimization of
Electrical and Electronic Equipment (OPTIM) & Aegean Conference on Electrical
Machines and Power Electronics (ACEMP), pp. 1001-1006, 2017.
[41] A. Stanciu, T.C. Balan, C. Gerigan, and S. Zamfir, Securing the IoT gateway based on the
hardware implementation of a multi pattern search algorithm, In Optimization of Electrical
and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines
and Power Electronics (ACEMP), 2017 International Conference on. (2017) 1001-1006.
93
[42] Q. Yaseen, M. Aldwairi, Y. Jararweh, M. Al-Ayyoub, and B. Gupta, Collusion attacks
mitigation in internet of things: a fog based model. Multimedia Tools and Applications,
(2017) 1-20.
[43] Y. Yilmaz, and S. Uludag, Mitigating IoT-based Cyber attacks on the smart grid, in machine
learning and applications (ICMLA), 2017 16th IEEE International Conference on. (2017)
517-522.
[44] M.E. Ahmed, H. Kim, and M. Park, Mitigating DNS query-based DDoS attacks with
machine learning on software-defined networking. In Military Communications Conference
(MILCOM), MILCOM 2017-2017 IEEE. (2017) 11-16.
[45] M. Stevanovic, and J.M. Pedersen, Machine learning for identifying Botnet network
traffic. Networking and Security Section, Department of Electronic Systems, Aalborg
University, Tech. Rep. (2013).
[46] T. Zhu, S. Dhelim, Z. Zhou, S. Yang, and H. Ning, An architecture for aggregating
information from distributed data nodes for industrial Internet of Things, Computers &
Electrical Engineering. 58 (2017) 337-349.
[47] Jeon, J. and Cho, Y., 2019. Construction and performance analysis of image steganography
based Botnet in Kakao Talk Openchat. Computers, 8(3), p.61.
[48] H.R. Zeidanloo, A.B. Manaf, P. Vahdani, F. Tabatabaei, and M. Zamani, Botnet detection
based on traffic monitoring. Networking and Information Technology (ICNIT), 2010
International Conference on. (2010) 97-10.
[49] M. Chatterjee, A. S. Namin and P. Datta, 2018,Evidence Fusion for malicious Bot detection
in IoT, IEEE International Conference on Big Data (Big Data), 2018, pp. 4545-4548.
[50] Adat, V. and Gupta, B.B., 2018. Security in Internet of Things: issues, challenges,
taxonomy, and architecture. Telecommunication Systems, 67(3), pp.423-441
[51] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, D. Breitenbacher, A. Shabtai, Y. Elovici,
N-BaIoT: Network-based detection of IoT Botnet attacks using deep auto encoders, IEEE
Pervasive Computing Vol.17 , No.3, 2018.
[52] C. McDermott, F. Majdani, A. V Petrovski, Botnet detection in the internet of things using
deep learning approaches, IEEE International Joint Conference on Neural Networks
(IJCNN), 2018.
94
[53] Y. Meidan, M. Bohadana, A. Shabtai, J. David Guarnizo, M. Ochoa, N. Ole Tippenhauer,
Y. Elovici, ProfilIoT: A machine learning approach for IoT device identification based on
network traffic analysis, In Proceedings of the Symposium on Applied Computing, pp. 506-
509, 2017.
[54] S. Homayoun, M. Ahmadzadeh, S. Hashemi, A. Dehghantanha, R. Khayami, BoTShark: A
deep learning approach for Botnet traffic detection, Cyber Threat Intelligence, pp. 137-153,
2018.
[55] N. An, A. Duff, G. Naik. M, M. Faloutsos, S. Weber, S. Mancoridis, Behavioral anomaly
detection of malware on home routers, IEEE 12th International Conference on Malicious
and Unwanted Software (MALWARE), pp. 47-54, 2017.
[56] L. Mathur, M. Raheja, P. Ahlawat, Botnet detection via mining of network traffic flow,
International Conference on Computational Intelligence and Data Science (ICCIDS 2018),
pp. 1668-1678, 2018.
[57] F. Villegas Alejandre, N. Cruz Cortes, E. Aguirre Anaya, Feature selection to detect Botnet
using machine learning algorithms, IEEE International Conference on Electronics,
Communications and Computers (CONIELECOMP), 2017.
[58] A. Bijalwan, N. Chand, E. Shubhakar Pilli, C. R. Krishna, Botnet analysis using ensemble
classifier, Perspectives in Science , pp. 502—504, 2016.
[59] F. Villegas Alejandre, N. Cruz Cortes, E. Aguirre Anaya, Feature selection to detect Botnet
using machine learning algorithms, IEEE International Conference on Electronics,
Communications and Computers (CONIELECOMP), 2017.pp.1-7.
[60] S. Miller and C. Busby-Earle, "The role of machine learning in Botnet detection," 2016 11th
International Conference for Internet Technology and Secured Transactions (ICITST), 2016,
pp. 359-364.
[61] C.Hammer schmidt, S.Marchal, R. State, S. Verwer, Behavioral clustering of non-stationary
IP flow record data, 12th International Conference on Network and Service Management,
CNSM 2016 and Workshops, 3rd International Workshop on Management of SDN and
NFV, ManSDN/NFV 2016, and International Workshop on Green ICT and Smart
Networking, GISN 2016, pp. 297–301,2016.
95
[62] G. Kirubavathi Venkatesh, R. AnithaNadarajan, HTTP Botnet detection using adaptive
learning rate multilayer feed-forward neural network, IFIP International Workshop on
Information Security Theory and Practice, Springer, Berlin, Heidelberg, pp. 38-48 ,2012.
[63] K. Singh , S. Chandra Guntuku, A. Thakur, C. Hota, Big data analytics framework for
peer-to-peer Botnet detection using random forests, Information Sciences, pp.488-497, 2014.
[64] Y. Meidan et al., 2018, N-BaIoT—Network-Based detection of IoT Botnet Attacks Using
Deep Autoencoders, in IEEE Pervasive Computing, vol. 17, no. 3, pp. 12-22.
[65] McDermott, D. Christopher Farzan Majdani, and Andrei Petrovski, Botnet detection in the
Internet of Things using deep learning approaches (2018).
[66] Meidan, Yair, Michael Bohadana, Asaf Shabtai, Juan David Guarnizo, Martín Ochoa, Nils
Ole Tippenhauer, and Yuval Elovici, ProfilIoT: A machine learning approach for IoT device
identification based on network traffic analysis, In Proceedings of the Symposium on
Applied Computing. (2017) 506-509.
[67] Homayoun, Sajad, Marzieh Ahmadzadeh, Sattar Hashemi, Ali Dehghantanha, and Raouf
Khayami, BoTShark: A deep learning approach for Botnet traffic detection, Cyber Threat
Intelligence. (2018) 137-153.
[68] F. Shaikh, E. Bou-Harb, J. Crichigno, N. Ghani, A machine learning model for classifying
unsolicited IoT devices by observing network telescopes, IEEE International Wireless
Communications and Mobile Computing Conference (IWCMC 2018) ,2018.
[69] K. Singh , S. Chandra Guntuku, A. Thakur, C. Hota, Big data analytics framework for
peer-to-peer Botnet detection using random forests, Information Sciences, pp.488-497, 2014.
[70] Meidan, Yair, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Dominik Breitenbacher,
Asaf Shabtai, and Yuval Elovici, N-BaIoT: Network-based detection of IoT Botnet attacks
using deep autoencoders. arXiv preprint arXiv:1805.03409 (2018).
[71] Chatterjee, M., Namin, A.S. and Datta, P., 2018, December. Evidence Fusion for Malicious
Bot Detection in IoT. In 2018 IEEE International Conference on Big Data (Big Data) (pp.
4545-4548). IEEE.
[72] Sengar, B. and Padmavathi, B., 2017, July. P2P bot detection system based on map
reduces. In 2017 International Conference on Computing Methodologies and
Communication (ICCMC) (pp. 627-634). IEEE.
96
[73] Gadelrab, M.S., ElSheikh, M., Ghoneim, M.A. and Rashwan, M., 2018. BotCap: Machine
learning approach for Botnet detection based on statistical features. International Journal of
Communication Networks and Information Security, 10(3), p.563.
[74] Hammerschmidt, C., Marchal, S., State, R., and Verwer, S. 2017. Behavioral clustering of
non-stationary IP flow record data. 2016 12th International Conference on Network and
Service Management, CNSM 2016 and Workshops, 3rd International Workshop on
Management of SDN and NFV, ManSDN/NFV 2016, and International Workshop on Green
ICT and Smart Networking, GISN 2016, pages 297–301.
[75] C., Marchal, S., State, R., Pellegrino, G., and Verwer, S. 2016. Efficient learning of
communication profiles from IP flow records. Proceedings - Conference on Local Computer
Networks, LCN, pages 559–562.
[76] Ijaz, S., Hashmi, F. A., Asghar, S., and Alam, M. 2017. Vector Based Genetic Algorithm to
optimize predictive analysis in network security. Applied Intelligence.
[77] Chen, W., Luo, X., and Zincir-Heywood, A. N. 2017. Exploring a service-based normal
behaviour profiling system for Botnet detection. In IFIP/IEEE Symposium on Integrated
Network and Service Management (IM), pages 947–952.
[78] Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino, 2014. An empirical comparison of
botnet detection methods Sebastian. Computers and Security Journal, Elsevier. Vol 45, pp
100-123.
[79] Wang, J. and Paschalidis, I. C. 2016. Botnet Detection based on anomaly and community
detection. IEEE Transactions on Control of Network Systems, 5870(c):1–1.
[80] Tzagkarakis, C., Petroulakis, N. and Ioannidis, S., 2019, June. Botnet attack detection at the
IoT edge based on sparse representation. In 2019 Global IoT Summit (GIoTS) (pp. 1-6).
IEEE.
[81] Herwig, S., Harvey, K., Hughey, G., Roberts, R. and Levin, D., 2019, February.
Measurement and analysis of hajime, a Peer-to-peer IoT Botnet. In NDSS.
[82] Garip, M.T., Reiher, P. and Gerla, M., 2019, September. RIoT: A rapid exploit delivery
mechanism against IoT devices using vehicular Botnet. In 2019 IEEE 90th Vehicular
Technology Conference (VTC2019-Fall) (pp. 1-6). IEEE.
97
[83] Banerjee, M. and Samantaray, S.D., 2019. Network traffic analysis based IoT Botnet
detection using Honeynet data applying classification techniques. International Journal of
Computer Science and Information Security (IJCSIS), 17(8).
[84] Nguyen, H.T., Nguyen, D.H., Ngo, Q.D., Tran, V.H. and Le, V.H., 2019, Towards a rooted
subgraph classifier for IoT Botnet detection. In Proceedings of the 2019 7th International
Conference on Computer and Communications Management, ACM. pp. 247-251
[85] Ceron, J.M., Steding-Jessen, K., Hoepers, C., Granville, L.Z. and Margi, C.B., 2019.
Improving IoT Botnet investigation using an adaptive network layer. Sensors, 19(3), p.727.
[86] Yin, L., Luo, X., Zhu, C., Wang, L., Xu, Z. and Lu, H., 2019. ConnSpoiler: Disrupting
C&C communication of IoT-Based Botnet through fast detection of anomalous domain
queries. IEEE Transactions on Industrial Informatics.
[87] Koroniotis, N., Moustafa, N., Sitnikova, E. and Turnbull, B., 2019. Towards the
development of realistic Botnet dataset in the internet of things for network forensic
analytics: Bot-IoT dataset. Future Generation Computer Systems, 100, pp.779-796.
[88] Farooq, M.J. and Zhu, Q., 2019. Modeling, analysis, and mitigation of dynamic Botnet
formation in wireless IoT networks. IEEE Transactions on Information Forensics and
Security, 14(9), pp.2412-2426.
[89] Alhajri, R., Zagrouba, R. and Al-Haidari, F., 2019. Survey for anomaly detection of IoT
Botnet using machine learning uuto-encoders. International Journal of Applied Engineering
Research, 14(10), pp.2417-2421.
[90] Spathoulas, G., Giachoudis, N., Damiris, G.P. and Theodoridis, G., 2019. Collaborative
Blockchain-based detection of distributed denial of service attacks based on Internet of
Things Botnet. Future Internet, 11(11), p.226.
[91] Vishwakarma, R. and Jain, A.K., 2019, April. A Honeypot with machine learning based
detection framework for defending IoT based Botnet DDoS Attacks. In 2019 3rd
International Conference on Trends in Electronics and Informatics (ICOEI) , IEEE, pp.
1019-1024.
[92] Yin, M., Chen, X., Wang, Q., Wang, W. and Wang, Y., 2019. Dynamics on hybrid complex
network: Botnet modeling and analysis of medical IoT. Security and Communication
Networks, 2019.
98
[93] Bezerra, V.H., da Costa, V.G.T., Barbon Junior, S., Miani, R.S. and Zarpelão, B.B., 2019.
IoTDS: A One-class classification approach to detect Botnet in Internet of Things
devices. Sensors, 19(14), pp.3188.
[94] Pour, M.S., Mangino, A., Friday, K., Rathbun, M., Bou-Harb, E., Iqbal, F., Shaban, K. and
Erradi, A., 2019, August. Data-driven curation, learning and analysis for inferring evolving
IoT Botnet in the wild. In Proceedings of the 14th International Conference on Availability,
Reliability and Security ACM pp.6.
[95] Soe, Y.N., Feng, Y., Santosa, P.I., Hartanto, R. and Sakurai, K., 2019. Rule generation for
signature based detection systems of cyber attacks in IoT environments. Bulletin of
Networking, Computing, Systems, and Software, 8(2), pp.93-97.
[96] KoronIotis, N., Moustafa, N. and Sitnikova, E., 2019. Forensics and deep Learning
mechanisms for Botnet in Internet of Things: A Survey of Challenges and Solutions. IEEE
Access, 7, pp.61764-61785.
[97] Shafi, Q. and Basit, A., 2019, January. DDoS Botnet Prevention using Blockchain in
Software Defined Internet of Things. In 2019 16th International Bhurban Conference on
Applied Sciences and Technology (IBCAST) (pp. 624-628). IEEE.
[98] Lange, T. and Kettani, H., 2019, March. On Security Threats of Botnet to cyber systems.
In 6th International Conference on Signal Processing and Integrated Networks (SPIN) (pp.
176-183). IEEE.
[99] R. K. Malaiya, D. Kwon, J. Kim, S. C. Suh, H. Kim and I. Kim, An Empirical evaluation of
deep learning for network anomaly detection, International Conference on Computing,
Networking and Communications (ICNC), 2018, pp. 893-898.
[100] Rahul Saxena, Introduction to K-nearest neighbor classifier , https://dataaspirant.com/k-
nearest-neighbor-classifier-intro/, last accessed on 05 July 2021.
99
List of Publications
1.Priyang Bhatt, Bhaskar Thakker, "A NOVEL FORECASTIVE ANOMALY BASED
BOTNET REVELATION FRAMEWORK FOR COMPETING CONCERNS IN INTERNET
OF THINGS", Journal of Applied Security Research , Taylor & Francis, volume 16, issue 2,
pp.258-278,2021 (ESCI & SCOPUS Indexed).
2.Priyang Bhatt, Bhaskar Thakker, "ISOLATING BOTNET ATTACKS USING BOOTSTRAP
AGGREGATING SURFLEX-PSIM CLASSIFIER IN IOT", Journal of Intelligent & Fuzzy
Systems, IOS Press, volume 38, issue 2, pp.1827-1840, 2020 (ACM Digital Library, SCI &
SCOPUS Indexed)
3. Priyang Bhatt, Bhaskar Thakker, "MASS REMOVAL OF BOTNET ATTACKS USING
HETEROGENEOUS ENSEMBLE STACKING PROSIMA CLASSIFIER IN IOT", Journal of
Communication Networks and Information Security (IJCNIS), KUST, volume 11, issue 3,
pp.380-390, 2019 (SCOPUS Indexed).
100
Appendix
1. List of hardware and software components used for experimental setup
The proposed method is evaluated with the experimental setup. The traffic is collected from 20
IoT real nodes (implemented with Raspberry pi 3) connected via the WI-FI network to the access
point and wired connection to the central switch and the router. Using Tcpdump, Tshark, and
Wireshark, the network traffic is sniffed, port mirroring on the switch has been utilized for
sniffing. C & C (Command & Control) has been achieved using a python script to send the file
and control IoT devices. Three IoT devices are configured as bots to generate DDoS and Spam
attacks to the rest of the devices in the network.
Here is information regarding hardware and software are used in the experimental setup.
1.1 Hardware Components
1.1.1. Raspberry Pi 3
To collect traffic from IoT nodes, Linux-based IoT devices have been used. In the experimental
setup, we used 20 Raspberry PI 3 as a Linux-based IoT device to collect network traffic and
inject the attack and detect the Botnet.
Link: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/
1.1.2. CISCO SYSTEMS 8-Port Gigabit Ethernet Desktop Switch (SG110D08NA)
In the experimental setup, twenty Raspberry Pi 3 connected via the WI-FI network to the access
point and wired connection to the central switch and also to the router. All the servers are
connected with a central switch and switch connected with the access point.
101
Link: https://www.amazon.in/SYSTEMS-Gigabit-Ethernet-Desktop
SG110D08NA/dp/B00V8IZ7JM
1.1.3. Laptops
Laptops are utilized as servers to record the data. DHCP Server to generate IP addresses
dynamically in the network. C & C (command & control) sever to send the file and to control
IoT devices. In the experimental setup total of four laptops are used to serve as different servers.
Link: https://www.gadgetsnow.com/laptops/Dell-Inspiron-15-3593-D560159WIN9S-Laptop-
Core-i3-10th-Gen8-GB1-TBWindows-10
1.1.4. D-Link DAP-1360 Wireless N Access Point
In the experimental setup, the traffic is collected from 20 IoT real nodes (implemented with
Raspberry pi 3) connected via the WI-FI network to the access point and wired connection to the
central switch and the router.
102
Link : https://eu.dlink.com/-/media/consumer_products/dap/dap-1360/manual/dap-
1360_c1_manual_v3_00_eu.pdf
1.2 Software components
1.2.1 Anaconda (Spyder) python 3
In the proposed work Anaconda IDE was used for implementation. Anaconda is a distribution of
Python and R for Data Analytics, Machine learning, Scientific computing, etc. Anaconda is a
Software package containing various IDEs and Applications like Spyder, Jupyter, and Visual
Studio Code. With this, it comes equipped with a package manager conda. Conda is an open-
source, cross-platform, package, and environment manager that aids in executing and installing
of packages and dependencies. Spyder is an Integrated Development Environment for Python
designed for Scientific Computing. It is included in the Anaconda package Distribution. It
enables users with the capabilities of Advanced Code Editing, Data Visualization, and
Debugging.
Link : https://docs.anaconda.com/anaconda/install/windows/
1.2.2 Wireshark/Tshark
Wireshark is used for sniffing network traffic. Wireshark (Formally, Ethereal) is an open-source
packet analyzer. It is used for computer network management and debugging. Wireshark is a
graphical tool developed using the Qt widget toolkit of C++, and at its core for the packet,
management uses pcap. Wireshark can be used on Linux, UNIX, and Windows Based systems.
Wireshark's operations are like those that the user can perform using the tcpdump command on a
Linux/Unix-based system. A primary advantage that it provides over the tcpdump command is
that the packets are displayed in color-coding. The contents of the packets can be viewed in
different encoding formats. It also aids in putting the supportable network interfaces into
promiscuous mode without manually changing them. There is also a command-line/Terminal
based version available with the name of Tshark.
103
1.2.3 Python Scapy
Python scapy is used to generate and detect security attacks. Formally, Scapy is a python library
for network management. It enables the developers to send, receive, sniff, and dissect the packets
in a network. Scapy supports a wide range of communication protocols. The developers can use
it to undertake the tasks like scanning, trace routing, probing, unit tests, attacks, and network
discovery replacing many well-known Linux/Unix-based commands like Nmap, arpspoof,
arping, etc.
Link: https://scapy.readthedocs.io/en/latest/introduction.html
1.2.4 tcpdump
Used for sniffing network traffic. tcpdump is a Linux/Unix-based command, which provides
network-analysis capabilities. tcpdump is a command-line-based tool, and it primarily works on
a TCP/IP networking stack. It printouts a description of the packets traveling on a network.
tcpdump also accepts Boolean Expressions, which helps in packet filtering based on the
protocols, timestamps, and packet numbers.
Link: https://opensource.com/article/18/10/introduction-tcpdump
2. Some of the Tshark and tcpdump command that we have used for traffic generation and
sniffing the traffic. We have used these commands in the isolated experimental setup.
Description Commands
Capturing packets with
tshark.
tshark –i wlan0 –w output_file.pcap
Reading a Pcap file tshark –r output_file.pcap
Generic Capture for an IP
Address.
tshark -R “ip.addr == 192.168.1.10” -r output_file.pcap
Send specified number of
packets.
tshark –M 100000
Creating CSV file with
Tshark
tshark -r output_file.pcap -T fields -e frame.number -e frame.time -e
eth.src -e eth.dst -e ip.src -e ip.dst -e ip.proto -E header=y -E
separator=, -E quote=d -E occurrence=f > dataset.csv
Display only source and
destination IP
tshark -o column.format: ’”Source”, “%s”, ”Destination”, “%d”‘
–Ttext
104
Display HTTP Responses tshark -o “tcp.desegment_tcp_streams:TRUE” -i eth0 -R
“http.response” -T fields -e http.response.code
Capturing N number of
packets using tcpdump
tcpdump –c N –i interface_name
Capture and save packet in
file using tacpdump
tcpdump –w filename.pcap –i interface_name
Read captured packet using
tcpdump
tcpdump –r filename.pcap
Capture packet from
specific port
tcpdump –i interface_name port port_number
Links:
1. https://www.cellstream.com/reference-reading/tipsandtricks/272-t-shark-usage-examples
2. https://www.wireshark.org/docs/man-pages/tshark.html
3. https://opensource.com/article/20/1/wireshark-linux-tshark
105
106
107