gujarat technological university (gtu) ahmedabad …

Novel Revelation Frameworks for Prediction of Botnet

Attacks in Internet of Things (IoT) �

�

A Thesis submitted To Gujarat Technological University

for the Award of

Doctor of Philosophy

in

Computer/IT Engineering

by

Priyang Prakashchandra Bhatt 159997107020

under the supervision of

Dr. Bhaskar Thakker

GUJARAT TECHNOLOGICAL UNIVERSITY (GTU)

AHMEDABAD

AUG 2021

��

�

© Priyang Prakashchandra Bhatt

��

�

��

�

Ph.D. THESIS Non-Exclusive License to

GUJARAT TECHNOLOGICAL UNIVERSITY

In consideration of being a Ph.D. Research Scholar at GTU and in the interest of the

facilitation of research at GTU and elsewhere, I, Priyang Prakashchandra Bhatt having

Enrollment No. 159997107020, hereby grant a non-exclusive, royalty-free and perpetual

license to GTU on the following terms:

a) GTU is permitted to achieve, reproduce and distribute my thesis, in whole or in part,

and/or my abstract, in whole or in part (referred to collectively as the “Work”) anywhere

in the world, for non-commercial purposes, in all forms of media;

b) GTU is permitted to authorize, sub-lease, sub-contract or procure any of the acts

mentioned in paragraph (a);

c) GTU is authorized to submit the Work at any National /International Library, under the

authority of their “Thesis Non-Exclusive License”;

d) The Universal Copyright Notice (©) shall appear on all copies made under the authority

of this license;

e) I undertake to submit my thesis as my original work, does not infringe any rights of

others, including privacy rights, and that I have the right to make the grant conferred by

this non-exclusive license.

g) If third party copyrighted material was included in my thesis for which, under the terms

of the Copyright Act, written permission from the copyright owners is required, I have

obtained such permission from the copyright owners to do the acts mentioned in

paragraph (a) above for the full term of copyright protection.

h) I retain copyright ownership and moral rights in my thesis and may deal with the

copyright in my thesis, in any way consistent with rights granted by me to my University

in this non-exclusive license.

i) I further promise to inform any person to whom I may hereafter assign or license my

copyright in my thesis of the rights granted by me to my University in this non-exclusive

license.

j) I am aware of and agree to accept the conditions and regulations of Ph.D. including all

policy matters related to authorship and plagiarism.

��

�

Abstract

In the Internet of Things (IoT) environment, any object with sensor nodes and other electronic

devices can involve communication over wireless networks. Hence, this environment is highly

vulnerable to the Botnet attack. Botnet attack degrades the system performance in a manner

difficult to get identified by the IoT network users. The Botnet attack is incredibly challenging

to observe and take away in a restricted time. Challenges prevailed in the detection of Botnet

attacks due to several reasons: its unique structurally repetitive nature, performing non-

uniform and different activities, and invisible nature followed by deleting the record of

history. Even though existing mechanisms have taken action against the Botnet attack

proactively, they have been less efficient in capturing Botnet attackers’ frequent abnormal

activities. When the number of devices in the IoT environment increases, the existing

mechanisms are missing more Botnets due to their functional complexity. So this type of

attack is very complex and challenging to identify.

To detect Botnet attacks, the first approach proposes a heterogeneous ensemble stacking

PROSIMA classifier. This approach takes advantage of cluster sampling in place of the

conventional random sampling method for higher prediction accuracy. We tested the proposed

classifier on an experimental test setup with 20 real IoT nodes. The proposed approach enables

mass removal of Botnet attack detection with higher accuracy at a reduced time that helps the

IoT environment to maintain the entire network's reliability.

To detect Botnet with high accuracy in less time than the first approach, the Second approach

proposes a Bootstrap Aggregating Surflex-PSIM Classifier. It gathers data from several sensor

nodes, preprocesses using Linear Random Euler Complex-valued Filter (LRECF).

Accordingly, the linearized data is subjected to the training phase comprising Random Poisson

Forest (RPF) to predict accurately the Botnet creating Distributed Denial of Service (DDoS)

and Spam attacks within less time. A similar Botnet is clustered using surflex-PSIM that

isolates the Botnet attacked clusters based on automatically trained characteristics pocket

value after being trained. Thus, with our proposed classifier's aid, Botnet is detected and

separated with high accuracy at reduced time, thereby ensuring system reliability with

enhanced system performance.

��

�

With the internet, billions of IoT devices are interconnected with each other and

communicating through messaging bots. The attackers sometimes control the messaging bots

to carry out several malicious activities. Thus, bots become a severe cybersecurity hazard for

IoT devices. For this reason, it is crucial to detect the existence of malicious bots and other

anomalies in the network. Our third approach proposes a novel forecastive anomaly-based

Botnet revelation framework for competing concerns in the Internet of Things (IoT) to tackle

these bots and anomalies. The technique works as a two-way progression, the first is the

instance creation, and the second is cataloging. As an alternative to machine learning

algorithms, ensemble-based stream mining is being used to generate several instances with

less memory and time in our work. Once when the instances are created, Graph Structure-

Based Detection of Anomaly (GSBDA) is initiated based on features derived by the stream

mining algorithm to detect hazardous anomalies. The second phase also utilizes a KNN (K

Nearest Neighbor) algorithm, a type of instance-based learning algorithm. It is used to identify

the Botnet accurately by observing the network flows.

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

��

�

�

�

�

�

�

�

�

�

�

Dedicated

To

My Family

�

�

�

�

�

�

�

�

�

�

�

�

��

�

ACKNOWLEDGEMENT

I deem it a great pleasure to record my gratitude to my research supervisor Dr. Bhaskar

Thakker, Ex-Professor, Symbiosis Institute of Technology (SIT), Pune. He has been a constant

source of motivation for me, and his inspiring guidance has played a crucial role in shaping

this thesis. I am privileged to have worked under his supervision.

Completing this work would not have been possible without the Doctoral Progress Committee

(DPC) members: Dr. Nikhil Kothari, Head & Professor, DDU, Nadiad, and Dr. Apurva Shah,

Associate professor, MSU, Baroda. I am thankful for their rigorous examinations and precious

suggestions during my research.

I wish to express my sincere gratitude to Late Shri. DR C L Patel sir, chairman Charutar Vidya

Mandal (CVM), permits me to study with job work.

I am grateful to Dr. Himanshu Soni (Principal), Dr. Maulika S. Patel (Head, CP Dept.), and

my colleagues (teaching, non-teaching) at G H Patel College of Engineering & Technology for

their help, continuous motivation, and support during my research work.

A big thank to my mother for their unconditional love and for extending her support since

childhood, and my father has always remained as my best friend and as my role model. I am

fortunate to have you in my life. I am grateful to my wife for encouraging me to take up this

endeavor. It would not have been possible for me to accomplish this without their support and

constant motivation. I cannot express in words the endless affection given by kinds, which

were necessary to fuel my journey. I also thank my sister, brother-in-law, niece, and nephews

for their constant love, motivation, and support.

I am thankful to one and all who were involved directly or indirectly in this long journey of

mine. In the end, I thank Almighty God for giving me direction and enthusiasm to move

through the tough times.

Priyang P. Bhatt

xv

TABLE OF CONTENTS

S.N. Content Pg.No.

i Title Page…………………………………………………………………... i

ii Copyright…………………………………………………………………. ii

iii Declaration…………………………………………………………………. iii

iv Certificate………………………………………………………………….. iv

v Course-work Completion Certificate……………………………………… v

vi Originality Report Certificate……………………………………………… vii

viii Non-Exclusive License Certificate ………………………………………... viii

viii Thesis Approval Certificate……………………………………………….. x

ix Abstract……………………………………………………………………. xi

x Acknowledgment………………………………………………………….. xiv

xi Table of Contents………………………………………………………….. xv

xii List of Abbreviations………………………………………………………. xviii

xiii List of Figures……………………………………………………………… xxi

xiv List of Tables………………………………………………………………. xxii

1 Chapter 1 Introduction 1

1.1 IoT Application………………………………………………………… 2

1.1.1 Smart Home………………………………………………………. 3

1.1.2 Smart Cities………………………………………………………. 3

1.1.3 Smart Environment……………………………………………….. 3

1.1.4 Agriculture………………………………………………………... 4

1.1.5 Industry…………………………………………………………… 4

1.1.6 Health and Lifestyle………………………………………………. 4

1.2 Security Threats: IoT Applications…………………………………….. 5

1.2.1 Security Issue: Sensing Layer…………………………………….. 5

1.2.2 Security Issue: Network Layer…………………………………… 6

1.2.3 Security Issue: Middleware Layer………………………………... 7

1.2.4 Security Issue: Application Layer………………………………... 7

1.3 Botnet Attack………………………………………………………….. 7

1.3.1 Botnet Detection Techniques………………………………………… 9

xvi

2 Chapter 2 Literature Review 14

3 Chapter 3 Mass Removal of Botnet Attacks Using Heterogeneous

Ensemble Stacking PROSIMA Classifier in IoT: First approach 25

3.1 Heterogeneous Ensemble stacking PROSIMA classifier …………… 26

3.2 Data Collection…………………………………………………… 26

3.3 Pre-processing………………………………………………………… 27

3.4 Feature Selection……………………………………………………… 27

3.5 Proposed Model……………………………………………………… 28

3.5.1 Popular ways to combine different classifiers……………………. 28

3.5.2 Popular ways to combine different classifiers …………………… 28

3.5.3 Overall Architecture of Heterogeneous Ensemble Stacking Meta-

classifier…………………………………………………………………….

29

3.5.3.1 XGBoost (Extreme Gradient Boosting) Algorithm…………. 29

3.5.3.2 Adaboost (Adaptive Boosting) Algorithm………………….. 29

3.5.3.3 Random cluster sampling forest Algorithm…………………. 30

3.6 Mass clustering based on PROSIMA protein similarity……………… 32

3.7 Experimental setup……………………………………………………. 34

3.8 Results and Discussion………………………………………………… 35

3.8.1 Calculation of packet arrival time…………………………………. 36

3.8.2 Packet Delivery Ratio (PDR) ……………………………………… 38

3.8.3 Packet Loss…………………………………………………………. 38

3.8.4 Throughput………………………………………………………….. 39

3.8.5 Clustering the Botnet of DDoS and SPAM types………………….. 39

3.8.6 Comparing proposed classifier with existing classifiers…………….. 42

4 Chapter 4 Isolating Botnet Attack Using Bootstrap Aggregation

Surflex-PSIM Classifier in IoT: Second approach

47

4.1 Seclusion of Botnet attacks using PSIM based on random passion

forest model………………………………………………………………

47

4.2 Data gathering phase……………………………………………... 47

4.3 Removing complex-valued variables……………………………… 48

4.4 Bootstrap Aggregating Surflex-PSIM Classifier……………………… 51

xvii

4.4.1 Random training model based on passion distribution………….. 51

4.5 Pseudocode for random passion forest……………………………… 53

4.5.1 Mass clustering based on P-SIM clustering……………………… 55

4.5.2 Pseudocode for P-SIM clustering……………………………… 56

4.6 Result and Discussion………………………………………………… 59

4.6.1. Implementation……………………………………………………. 59

4.6.2 Packet Delivery Ratio (PDR) ………………………………………. 60

4.6.3 Packet Loss………………………………………………………….. 60

4.6.4 Throughput………………………………………………………….. 61

4.6.5 Clustering of Botnet of DDoS Attack……………………………… 62

4.6.6 Clustering of Botnet of SPAM attack……………………………….. 63

4.6.7 Comparison of proposed system with existing techniques………….. 64

5 Chapter 5 A Novel Forecastive Anomaly Based Botnet Revelation

Framework for Competing Concerns in Internet of Things(IoT):

Third approach

70

5.1 Forecastive Anomaly-based Botnet Revelation framework…………… 70

5.2 Ensemble-based stream mining…………………………………... 71

5.3 Ensemble Classification approach…………………………………… 73

5.4 Cataloging……………………………………………………………… 75

5.5 Result & Discussion…………………………………………………… 76

5.5.1 Performance Analysis……………………………………………. 79

5.5.2 Performance Comparison……………………………………… 82

6 Chapter 6 Conclusion, Future Scope, and References 85

6.1 Conclusion…………………………………………………………… 85

6.2 Future Scope…………………………………………………………… 87

6.3 References…………………………………………………………… 89

List of Publications…………………………………………………….. 99

Appendix………………………………………………………………….. 100

xviii

List of Abbreviations

AdaBoost Adaptive Boosting

AGD Algorithmically Generated Domains

ARM Advanced RISC Machine

BLSTM-

RNN)

Bidirectional Long Short Term Memory based Recurrent Neural Network

C&C Command and Control

CDN Content Delivery Networks

CNN Convolution Neural Network

CPU Center Processing Units

CSV Comma Separated Values

D2D Device-to-Device

DBC Distributed Block Chain

DDoS Distributed Denial of Service

DHCP Dynamic Host Configuration Protocol

DNS Domain Name System

ELF Firmware Linkable Format

FN False Negative

FP False Positive

FPR False Positive Rate

FPR False Positive Rate

GA Genetic Algorithm

GRE Generic Routing Encapsulation

GSBDA Graph Structure-Based Detection of Anomaly

HTTP Hyper Text Transfer Protocol

ICMP Internet Control Message Protocol

IGMP Internet Group Management Protocol

IoT Internet of Things

IoTDS Internet of Things Detection System

IRC Internet Relay Chat

xix

ISP Internet Service Provider

IT Information Technology

KNN K- Nearest Neighbor

LAN Local Area Network

LDA Linear Discriminant Analysis

LOF Local Outlier Factor

LRECF Linear Random Euler Complex-Valued Filter

LSTM Long Short Term Memory

MIPS Million Instruction Per Second

ML Machine Learning

MSE Most Significant Bit

NIDS Network Intrusion Detection Systems

NTP Network Time Protocol

OS Operating System

P2P Peer-to-Peer

PC Personal Computer

PDR Packet Delivery Ratio

PHP Personal Home Page

PPC Performance Computing

PROSIMA Protein Similarity

PSIM Protein Similarity

RBF Radial Basis Function

ROC Receiver Operating Characteristics

SMO Social Media Optimization

SMS Short Message Services

SNMP Simple Network Management Protocol

SOM Self Organizing Map

SSH Secure Shell

SSL Secure Socket Layer

SVM Support Vector Machine

TCP Transmission Control Protocol

xx

TCP/IP Transmission Control Protocol/Internet protocol

TN True Negative

TP True Positive

TPR True Positive Rate

UDP User Datagram Protocol

VANET Vehicular Ad-Hoc Network

WSN Wireless Sensor Network

XGBoost Extreme Gradient Boosting

xxi

List of Figures

Figures Description Page

No.

FIGURE 1.1 IoT Layers 5

FIGURE 1.2 Different types of attacks on IoT System 6

FIGURE 1.3 Botnet Structure 9

FIGURE 1.4 Typical Botnet Attack 10

FIGURE 3.1 Heterogeneous Ensemble stacking PROSIMA Classifier 25

FIGURE 3.2 Overall Structure of the IoT network with Botnet Attack 26

FIGURE 3.3 Training process with random clustering forest Algorithm 31

FIGURE 3.4 Process Flow of Heterogeneous Ensemble stacking meta-

classifier

32

FIGURE 3.5 Process flow of the proposed system 33

FIGURE 3.6 Flow Diagram for Clustering of Botnet using PROSIMA 33

FIGURE 3.7 The IoT Experimental setup for detecting Botnet Attacks 35

FIGURE 3.8 The IoT based network 36

FIGURE 3.9 PDR During normal and attack time 38

FIGURE 3.10 The Network packet loss over normal and attack duration 38

FIGURE 3.11 The Network throughput over normal and attack duration 39

FIGURE 3.12 Clustering of Botnet attack which leads to DDoS and SPAM

attacks

41

FIGURE 3.13 Comparison graph for Precision 43

FIGURE 3.14 Comparison graph for Recall 44

FIGURE 3.15 Comparison graph for F-Measure 45

FIGURE 3.16 Comparison graph for Accuracy 46

FIGURE 4.1 Overall Architecture of the IoT network with Botnet Attack 48

FIGURE 4.2 Complex valued linear filtering 49

FIGURE 4.3 Process flow of the proposed system 50

FIGURE 4.4 Random Classifier 51

FIGURE 4.5 Training process with random Poisson forest 54

FIGURE 4.6 Flow diagram for clustering of Botnet 58

xxii

FIGURE 4.7 The IoT based Network 59

FIGURE 4.8 Packet Delivery Ratio during normal and attack period 60

FIGURE 4.9 Packet loss during normal and attack period 61

FIGURE 4.10 Throughput of the network under normal and attack period 61

FIGURE 4.11 Clustering of Botnet attack which leads to DDoS attack 64

FIGURE 4.12 Clustering of Botnet attack which leads to Spam attack 65

FIGURE 4.13 Comparison graph for Precision 66

FIGURE 4.14 Comparison graph for Recall 67

FIGURE 4.15 Comparison graph for F-measure 68

FIGURE 4.16 Comparison graph for Accuracy 69

FIGURE 5.1 Proposed Framework 70

FIGURE 5.2 Ensemble-based stream mining-concept drift in the unbounded

data stream

71

FIGURE 5.3 Ensemble-based classifier design 73

FIGURE 5.4 a) Typical fair GSBDA example, b) Best substructure and c)

Anomalous Substructure

75

FIGURE 5.5 (a) Node Initialization (b) Node Creation 79

FIGURE 5.6 Nodes that are detected as Botnet 79

FIGURE 5.7 Performance metric of the proposed framework in terms of

arrival time, packet delivery ratio, and throughput

81

FIGURE 5.8 Comparison of proposed with Supervised and unsupervised

learning methods.

84

xxiii

List of Tables

Table No. Description Page No.

TABLE 1.1 Comparisons of IT network and IoT Network 2

TABLE 3.1 Log details of each node in the network 40

TABLE 3.2 Lists of nodes clustered under DDoS and SPAM Botnet Attack 41

TABLE 3.3 (a) List of classifiers with the proposed system 42

TABLE 3.3 (b) List of classifiers with the proposed system 42

TABLE 4.1 Log details of each smart object in the network 62

TABLE 4.2 List of nodes clustered under DDoS Botnet attack 63

TABLE 4.3 List of nodes clustered under Botnet spam attack 64

TABLE 4.4 List of classifiers with proposed system 65

TABLE 5.1 CTU-13 Scenarios 77

TABLE 5.2 CTU-13 Dataset Distributions 78

TABLE 5.3 Performance metric of the proposed framework 80

TABLE 5.4 Proposed performance in terms of throughput, packet loss, and

packet delivery ratio, and arrival time

80

TABLE 5.5 Total No. of nodes to search for Botnet and other anomalies 81

TABLE 5.6 Comparison with the prior methodology for Scenario 1 82





1

CHAPTER 1

Introduction

New technologies are developing day by day, followed by continuous development in the cyber

world. IoT has supported significant changes in our daily lives in many aspects, such as health

care and traffic monitoring services. Additionally, it helps machine information to the machine

by connecting multiple devices over the Internet. In contrast, there is an increase in internal

vulnerabilities that cybercriminals often leverage. However, the number of active users in IoT is

increasing day by day [1].

IoT can transmit data over a network without specially identified devices, mechanical and digital

devices, objects, unique identifiers (UIDs), and man-to-man, Human-to-computer interaction.

IoT connects various technologies like Cloud, Artificial Intelligent, Sensors, and Actuators. The

Internet of things derived from two terms first is 'Internet’, and the second one is 'Things.' The

Internet would be the connection medium through which we will realize global connectivity.

Things can be anything like chairs, fans, and Television. Traditionally these things are not

designed to communicate with the Internet [2].

The IoT application has been increasing day by day. The digital economy will also grow with

these applications and concepts. Security and privacy issues are generated due to the creation of

a large number of applications without any security concern. There are different securities and

privacy issues in the IoT network compared to the traditional IT networks.

It is apparent from the below comparison that the IoT network is vulnerable to many security

threats due to its constrained resources. Many challenges are there in the IoT network compared

to existing traditional IT networks [3]. Because of these reasons, many security attacks have been

targeting the IoT network. Mirai is one such kind of attack generated against 2.5 million IoT

devices connected with the Internet in October 2016, which launched Distributed Denial of

Service (DDoS attack). According to Wikipedia, other such kinds of attacks against IoT

networks are Hajime and Reaper [81,116].

2

TABLE 1.1 Comparisons of IT network and IoT network

Traditional IT network IoT Network

IT Network is rich in terms of resources

(Software and Hardware).

The IoT network is not rich in terms of

resources compared to the IT Network.

No Resource Constraint in terms of power and

memory.

Resource Constraint in terms of power and

memory.

Complex algorithms can be executed. Required small algorithm.

Uses homogenous technology terms of

protocol etc.

Uses heterogeneous protocol and

technologies.

Because of resource constraints, IoT networks easily allow hackers to enter the system's gateway

and access network data. Other than things, IoT sensors are injected into the human body to

measure the condition of different human body organs. It would become dangerous if such kinds

of sensors are compromised [4].

If we go with the stack, then the IoT network can be divided into four layers. The first layer

contains physical entities like sensors and actuators to the cognition of data. The collected data

would be transmitted through the second layer of the IoT stack. The next layer is called the

middleware layer, which acts as an interface between the network and application layer. The

final application layer contains different types of applications, like smart homes, smart transport,

etc.

This chapter discusses various possible security threats in IoT applications for these four layers.

The issues associated with the gateways that connect these layers are also discussed in this

chapter.

1.1 IoT Applications

IoT has many applications; some are in the home, cities, industry, environment, and agriculture

domain.

1.1.1 Smart Home

Smart homes have many small IoT components like smart lighting, smart appliances, intrusion

detection, and smoke/gas detector. Smart lighting's primary purpose is to automatically save

3

energy and switch on/off as per the current condition. A light that is wired or wireless-enabled

would be controlled remotely using mobile or web applications. Other such applications are

smart appliances in modern homes like TV, refrigerators, washing machines, etc. To operate

these appliances need separate remote controllers. IoT makes it easy to operate using a single

mobile or web interface and also sends current status information and notification of such

devices to the user in real-time. Also, fetch the updates of its software automatically from the

Internet. Another application is intrusion detection in smart homes, which generates alerts to the

user informing SMS or E-mail with photos or video clips attached. Similarly, smart smoke/ gas

detectors generate signals to the user for some conditions like fire alert or harmful gas to the user

using SMS or E-mail [5].

1.1.2 Smart Cities

The most widely used IoT applications in smart cities are smart parking, smart lighting,

intelligent roads, structural health monitoring, and surveillance. While randomly finding the

empty slot for parking, drivers contribute to additional congestion. IoT smart packing can help

the drivers to find parking slots easily and quickly using mobile or web applications that the

drivers can access. Sensors are placed in the parking zone that sends the message to the Internet

via a controller. Smart lighting helps in power saving and changes. It automatically states light

intensity and is connected with other lights for information sharing regarding current light

conditions. Intelligent roads are associated with different sensors that send alerts to the driver

regarding current driving conditions. Structural health monitoring systems will predict the heath

of buildings/bridges and prevent accidents and sudden breakdown conditions. The surveillance

system detects and monitors different events in the city for safety and security purposes [6].

1.1.3 Smart Environment

A smart environment having many subsections like weather monitoring, air pollution

monitoring, noise pollution monitoring, Forest fire detection, and river flood detection.

Intelligent weather monitoring systems fetch data from connected sensors (temperature,

humidity, pressure sensors, etc.) and send it to the cloud for monitoring and analysis to make

better decisions and be conveyed to subscribers. IoT-based air pollution systems monitor harmful

gasses generated by factories and industries to decide on air pollution control systems. Due to the

large population in cities noise pollution rate is increasing day by day proposal to the air

4

pollution rate. Noise pollution control is as essential as air pollution that increases stress and

sleeps distribution. The IoT-based noise pollution control system monitors the noise level of the

different portions of the city. It updates it to the cloud that will use to identify noise pollution

sources in the city. Forest fire causes significant damage to the human as well as to human life.

IoT- based fire forest detection predicts fire detection and forwards the trigger information to the

concern handling system. River floods cause the same damage as a forest fire. Early monitoring

can minimize the damage by placing different sensors that monitor the water level and send data

to the server for further processing and decision-making [7].

1.1.4 Agriculture

Smart agriculture helps in crop monitoring also for saving water during irrigation. In intelligent

irrigation systems, monitor the moisture using sensors and generate water flow accordingly as

per the given threshold. Also, we can create a schedule for watering as per collected moisture for

different plants and trees. We can almost use a greenhouse monitoring system that can monitor

as per climatic conditions for the plant's growth [8].

1.1.5 Industry

IoT can help in machine diagnosis, and prognosis means monitoring the current operating

condition and comparing the same with the normal state for measuring the machine's

performance. There are a large number of components available in the device. Placing different

sensors inside the machine can monitor the current working condition periodically and generate

the trigger for its betterment. IoT can also help in indoor air monitoring systems that can cause

health issues. Placing different air monitoring sensors inside the industry can monitor and help

reduce air pollution [9].

1.1.6 Health and Lifestyle

Now a day's many wearable health monitoring system devices are available to measure fitness

monitoring. These wearable devices form a WSN (Wireless Sensor Network) called a body area

network [10].

5

Figure 1.1 IoT Layers [1]

1.2 Security Threats: IoT Applications

Different layers (sensing Layer, network layer, middleware layer, and application layer) of IoT

can cause many security threats.

1.2.1 Security Issue: Sensing layer

The sensing layer represents sensors and actuators, the physical part of the IoT network. There

are different possibilities of security threats like node capturing using these techniques, and

attackers can replace normal nodes with malicious nodes by capturing the data flow between IoT

devices. Due to the automatic update at the gateway attacker, a malicious code injection attack

6

can inject malicious code inside it and violate the gateway node's functionality. False data

injection attack, if the attacker can capture or inject the malicious code inside the gateway,

injects faulty data inside the node that will generate unpredictable results and disturb the

decision-making system. Other attacks that can be possible on the sensing layer are sleep

deprivation attacks and booting attacks in the first attacker can drain the battery power and create

a denial of service attack. In the second type of attack, attackers try to disturb the booting process

by that they can generate vulnerability while restarting the IoT devices [11-12].

Figure 1.2 Different types of attacks on IoT System [1]

1.2.2 Security Issue: Network Layer

A typical attack on the network layer of the IoT stack is a phishing attack in which an attacker

tries to compromise the username and password of the webpage of some IoT application by that

attacker can perform other malicious activities. Access attack, in this kind of attack, the attacker

stays more time in the network to steal the information of the IoT network. This kind of attack is

challenging to handle. The most common and dangerous attack is the DDoS/DoS attack in that

the attacker generates a large number of unwanted requests that deny authenticated devices

access to the network. In this kind of attack, routing attacks route the packet through malicious

nodes and harm the entire system [13].

7

1.2.3 Security Issue: Middleware Layer

The middleware layer is the interface between the application and the network layer. The

middleware layer is where broker, data, and machine learning algorithms lie. This kind of attack

possible on the middleware is a men-in-middle-attack in which the attacker tries to compromise

the broker so that they can take complete control of the IoT network. In this type, SQL Injection

attack attacks, in this attack malicious queries inside the SQL database for obtaining data of the

user. Flooding attacks on the cloud are the same as the DDoS attack, but may queries flood the

cloud to increase the cloud server's load [14].

1.2.4 Security Issue: Application Layer

The application layer is directly serving the users. Here may arise different issues as per the

different kinds of applications. Security issues at the application layer are data thefts. IoT users

generate millions and billions of data during communication. Also, users of IoT networks

register their private data to the application, which causes the attacker to steal confidential

information using encryption and authentication can prevent this type of attack. Access control

attack, access control allows the user to access the data. If access control is compromised, then

the entire IoT application is compromised. Malicious code injection attacks by that attacker try to

inject malicious code in the current script like cross-site scripting attack and hijack the IoT as a

full account of the user. Sniffing attack in which the attacker sniffs the packet and tries to read

user data if there is no proper protocol to prevent this. Reprogram attack, in which an attacker

tries to reprogram the IoT node/device remotely to gain control [15].

1.3 Botnet Attack

As per the above discussion, many attacks on the IoT network, one such latest attack on the IoT

network is the Botnet attack. With the ongoing quick improvement of the Internet of Things

(IoT), there has been expanding enthusiasm for understanding rising digital dangers in IoT [16].

Nodes of IoT are limited in resources where dedicated and diversified communication protocols

are used. Some of these differences weaken the ability of IoT nodes to protect themselves [17].

Since in an IoT environment, any object equipped with a sensor node and other microelectronic

devices can involve in communication over a wireless network, and this environment is highly

vulnerable to the Botnet attack.

8

A Botnet is an extensive collection of compromised nodes, which is controlled remotely by the

bot-master. A group of smart objects come together in a Botnet attack and carry out operations

leading to the destruction of the IoT-based system. Bot originates from the word robot that

naturally works as a PC program or content composed by the bot-master [18]. The figure shows

the working of the Botnet. A Botnet can generate a substantial volume of attacks of many types

like DDoS, Phishing, etc. Bot-Master controls a Botnet. The bot-master tries to infect as many

devices that significantly impact the attack to create a Botnet. Bot-master handles this entire

network using a C & C server. Compromised nodes will follow the server's command and attack

the target. The Botnet is not to infect just one device but to millions of network nodes. The

Botnet attack's main idea is to compromise the node or IoT devices for their purpose. Here are

the different steps that Botnets are using to infect the target nodes.

1. Bots interact through legitimate channels of communication.

2. They can use different communication techniques like IRC, Telnet, etc.

3. Infected nodes are called Bot, which communicates with the server, and C & C server can now

control the infected Node.

4. Now, the C &C server can communicate and give instructions to the infected host to serve its

task.

The process of the Botnet attack has been shown with the aid of Figure 1.3.

As shown in the figure below, bot-master initializes communication using different

communication channels like IRC, telnet, etc. Bot-master performs registration of the

compromised Node. Bot-master can inject malicious code inside the compromised Node (called

Bot). By repeating this step, Bot-master creates an extensive Bot (Compromised Node) network

using C & C Server. Bot-master can send the command to the compromised host (Bot) to

perform the task. Bot-master can keep this connection for a long time.

9

FIGURE 1.3 Botnet Structure

[Source: https://blog.emsisoft.com/en/27233/what-is-a-botnet/]

The researcher uses different methods to detect and prevent Botnet attacks. The following

section discussed such methods.

1.3.1 Botnet Detection Techniques

Day-by-day new technologies are developed, followed by a continual development in the cyber

world. At the same time, IoT has endorsed significant daily life changes in numerous aspects,

such as health care and traffic monitoring services. Moreover, it aids the machine to machine

communication by connecting multiple devices over the Internet. Conversely, there is a rise in

intrinsic vulnerabilities that are often leveraged by cybercriminals. Yet, the number of active

users in IoT gets increased day by day.

With the continued rapid advancement of the Internet of Things (IoT), there has been increasing

enthusiasm for understanding rising digital dangers in the IoT domain. IoT devices are

10

amazingly defenseless and alluring to aggressors for their exceptionally heterogeneous parts,

innocent security arrangements, and powerless encryption check [19].

FIGURE 1.4 Typical Botnet attack

Bot originates from the word robot that naturally works like a computer program or content

composed by the bot-master. This Botnet continues to be a significant source of large-scale

attacks on the Internet with recent increases in attack traffic [20].

Botnet location and removal are essential to resolve the indicated issues and are done by the

interruption location framework and honeynet, which suffers from of Botnet detection [21]. A

recent trend is network-based Botnet detection, which uses Machine Learning Algorithms

(MLAs) to identify malicious traffic [22]. These machine learning-based strategies make

discernable patterns inside the system activity [23]. This class of detection approaches

guarantees mechanized recognition that can sum up learning about harmful system activity from

the accessible perceptions, subsequently dodging traps of mark-based discovery approaches that

are just ready to identify known movement oddities [24].

Different identification techniques possess different strategies that utilize assorted movement

examination standards, focusing on different Botnet arrangement attaributes [25]. The primary

11

presumption of the machine learning-based methods is that Botnet makes discernable examples

inside the system activity [26]. The discovery because of MLAs system activity investigation

guarantees an adaptable identification that does not expect the movement to display any bizarre

attributes [27]. The class of discovery strategies does not require earlier information on Botnet

movement designs but instead deduces the learning exclusively from accessible perceptions.

Different recognition strategies like Random forest, Naive Bayes, SMO, and MLP machine

learning algorithms are used to classify the data, which fails to identify the type of Botnet attack

[28].

Because of IoT nodes’ resource constraints, heterogeneous protocols are used, and it isn't easy to

protect themselves [29]. Data from nodes would send to the cloud in the IoT System, which

processes the data and then sends it to users [30]. Botnet allows the attacker to access the device

connected with IoT and get access to the connection. This kind of attack raises security concerns,

and a third party achieves control of the IoT device for malicious activities. So such a system is

becoming a desirable target for the attackers [31].

Recently the most potent attacks were performed by Botnet, which consisted mainly of insecure

IoT devices. The Botnet Mirai is considered the most massive Botnet in history, containing many

compromised IoT devices [32]. C&C servers referred to as command and control servers are

evolved for providing Botnet management platforms. C & C servers are specialized computers

controlled by attackers to send commands, spread malicious codes, files, and steal information

from the victim network [33]. The C & C servers hosting the Botnet herder's victims are

designed to quickly deploy a wide array of network and application attacks, provide

implementation scripts to Botnet victims, and quickly scale the attacks. The servers are capable

of Peer to Peer (P2P) communication and collaboration. The Botnet can control by single or

multiple Botnet herders [34].

The fundamental suspicion of strategies based on machine learning is that Botnet makes

discernable patterns inside the system activity. These patterns could be productively identified,

utilizing machine learning algorithms [35]. This class of detection approaches guarantees

mechanized recognition that can sum up learning about harmful system activity from the

accessible perceptions, subsequently dodging traps of mark-based discovery approaches that are

just ready to identify known movement oddities [36]. For the Botnet attack detection, machine

12

learning algorithms like random forest, naive bayes, SMO, and MLP are used for classification

purposes [37].

Support Vector Machine (SVM) looks for the most significant factual edge in the interim that

keeps each other from a similar class and far away from the diverse classes in the edge sense

[38]. Fuzzy means clustering algorithm likewise is also utilized for characterizing information.

SVM, random Forest, and naïve bayes with normal word vectors, an LDA-based classifier has

better execution. The downstream of machine learning examination is an expansion for the

learning approach yet considered [39]. A more significant number of IoT devices connected with

the Internet create security and make the situation vulnerable to the Botnet attack [40]. The

approach is well suited for detecting compromised IoT devices because these connected

appliances are typically task-oriented. Accordingly, they execute fewer, potentially less, complex

network protocols and exhibit traffic with minor variance than computers. However, the

prediction accuracy is very low [41].

One of the significant security concerns in IoT is Botnet, a pervasive and hazardous thread.

Several thousand to millions of compromised computers (bots) in a network are used by

malicious attackers to perform various illicit and vulnerable activities [42]. All such bots are

linked to a central communication system for receiving the attacker's commands to execute

malicious actions on a besieged system [43]. The main communication system offers a large

distributed platform to perform various malicious activities, including distributed denial-of-

service (DDoS), spamming, phishing, spying, etc. It creates severe threats/risks to several

industries, government organizations, academic circles, etc. [44, 45].

The adversary can employ the affected network to achieve malicious activities, including

Phishing attacks, data-stealing, Distributed Denial of Services (DDoS). Two direct detection

approaches are deployed to deal with Botnet vulnerabilities, namely, host-based and network-

based [46,47,49]. The host-based method exhibits low reliability in Botnet detection owing to its

constrained computation and power. A hierarchical classification of the network-based Botnet

detection approach in the IoT domain is proposed by [49]. Later, honeypots are employed to

detect the Botnet by analyzing, understanding, and characterizing bots' behavior followed by

tracking. Moreover, to see the existence of bots, honeypots require signature extraction, data

inspection, etc.

13

As per [50], usual networks constitute an optional detection source. In contrast, the Network

Intrusion Detection Systems (NIDS) monitor traffic data continuously and without human

intervention when using pattern matching to detect signs of undesirable activities. Such patterns

may rely on signatures identified by honeypot, DNS traffic with a potential C & C server, traffic

anomalies data mining, and hybrid approaches. The anomaly-based method is discussed in [101]

for detecting compromised IoT devices because these connected appliances are typically task-

oriented. Accordingly, they perform less, low latent, intricate network protocols, and exhibit

traffic with minor variance than PCs. However, the prediction accuracy is very low.

IoT could be a distinct network with a sizable amount of applications wherever there's an

opportunity in the prevalence of traffic and privacy considerations. In contrast, a single

degradation of a system fails out the entire structure. Similarly, hackers intrude on the network

using Botnet and degrade the method.

There arises a falloff in high-dimensional data systems to accurately knock out the Botnet, which

leads to inaccuracy in detection. In addition to that, delay in the detection of Botnet causes

degradation of the system.

It becomes essential to observe the Botnet accurately and to frame out a structure to prevent

Botnet. In the IoT, working with high-dimensional data will cause a delay in Botnet detection.

Delay in the Botnet detection will slow down the performance of the entire network.

Chapter 2: Surveys the available literature on the Internet of Things based on IoT Botnet attacks

and DDOS.

Chapter 3: Presents an Analysis on Mass Removal of Botnet Attacks Using Heterogeneous

Ensemble Stacking PROSIMA Classifier in IoT

Chapter 4: Provides a Detailed Description of isolating Botnet Attacks Using Bootstrap

Aggregating Surflex-PSIM Classifier in IoT

Chapter 5: Explains About a Novel Forecastive Anomaly Based Botnet Revelation Framework

for Competing Concerns in the Internet of Things

Chapter 6: Provides the conclusion and future scope of this thesis.

14

CHAPTER 2

Literature Review

In this chapter, existing methods for Botnet attack detection are discussed that ensure the

coverage of all essential works proposed in this area. Such fifty contributions were identified

from reputed journals and conferences, particularly from the last three years (2017-2019).

Meidan, Yair et al. [51] proposed autoencoders for anomaly detection from network traffic.

Botnet attacks have been detected from compromised IoT devices with high accuracy and a false

error rate. Autoencoders built for each and every IoT device in the network were trained with

malicious network traffic. It might be more difficult to capture its normal behavior, and

therefore, future observations may be subject to more categorization errors.

McDermott, D. Christopher et al. [52] proposed a deep learning-based Bidirectional Long Short

Term Memory based Recurrent Neural Network (BLSTM-RNN) model to detect Botnet for IoT

devices. BLSTM-RNN was used to recognize the text and the attack vector was converted into a

tokenized integer format. That was how minor FPR (False Positive Rate) in Botnet attack

detection. By helping consumers become aware when their device is infected, we hope to raise

awareness of the inherent vulnerabilities and aid them in making better choices in the future with

regard to procurement, and operation of such devices.

Yair Meidan et al. [53] presented machine learning algorithms on network traffic data for the

accurate identification of IoT devices connected to a network. To train and assess the classifier, it

collected and labeled network traffic data from nine different IoT devices, computers and smart

phones. Consuming supervised learning, it introduced a multi-stage meta classifier in the first

stage, the classifier can distinguish between traffic generated by IoT and non-IoT devices. In the

second stage, each IoT device was linked to a specific IoT device class. In future research, we

plan to explore applications and adapt our technology to additional scenarios, including different

network protocols and various data capturing points, to better understand how our approach

scales and generalizes.

15

Homayoun, Sajad, et al. [54] proposed BoTShark Deep learning-based Botnet traffic shark using

Convolution Neural Network (CNN) and used softmax at the end to identify malicious traffic.

That was how attacks from compromised IoT devices were detected. Our study also showed that

autoencoders perform better than CNN since it generates minor false positives. Applying other

deep learning techniques such as Long Short Term Memory (LSTM) will be considered a future

study.

N. Duff, et al. [55] presented a framework for classifying unsolicited IoT devices in machine

learning (ML) enterprises. Namely, information on the IP header from darknet data is obtained

for review. They will then consider multiple supervised ML algorithms to identify these Layer 3

headers. Our results show that random forest and gradient boosting have high recall and

precision ratings, while naïve bayes has the worst quality.

Lakshya Mathur et al. [56] focused on the unresolved issue of providing robust malware

detection for secure home routers. This work contrasts the efficacy of three approaches to

behavioral malware detection on home endpoint routers by analyzing kernel-level system calls

on these routers. i) one-class support vector machine, ii) principal component analysis, and iii)

naive anomaly detector based on unseen n-grams. However, one drawback of the naive anomaly

detector is that it is not easy to choose an appropriate detection threshold.

Francisco Villegas Alejandre et al. [57] defined the feature selection process for efficient

detection of the Botnet attack in the network. The main aim of this paper was to support different

researchers to select various efficient features from the dataset to improve accuracy to detect

Botnet attacks. Machine learning to train classifiers was applied to the data collected to evaluate

the tests. Analysis of network flow information is used as a method of identification because it

does not rely on packet’s content, Thus, providing immunity to the latest encryption and

obfuscation used by attackers to hide their bots.

Anchit Bijalwan et al. [58] performed the ISCX dataset for training and testing purposes. We

extracted the training and testing dataset functions. After extracting the features of this dataset,

these features are divided into two categories, regular traffic and Botnet traffic, and named. After

16

using a modern data mining method, we used a classifier algorithm package. Experimental

results show that the quality of finding bot proofs utilizing a collection of classifiers is better than

a single classifier.

Nareli Cruz Cortes et al. [59] presented a novel method for selecting features for the detection of

Botnet at their C & C server. The major problem is that researchers have suggested features

based on their experience, but there is no mechanism for testing such features, as some of these

features could have a lower detection rate than others. Results are shown, resulting in a

significant reduction in features and a higher detection level than the related work reported.

In this paper, Sean Miller et al. [60] defined the brief overview of the various machine learning

methods and their use in Botnet detection. The main aim of this paper is to clearly define the role

of different ML methods in the detection of Botnet. They also discuss subsets’ different flow rate

features and the resulting effect on detection accuracy depending on the machine learning

approach used. Moreover, multi-perspective machine learning is absent.

Hammer Schmidt et al. [61] presented an automatic online method for detecting change points in

network traffic based on an IP flow record analysis. This approach is used to break the observed

behavior into minor consecutive actions that differ. Segmented traffic is used to learn a specific

contact profile that accurately characterizes the behaviors between the two observed change

points. Moreover, there is a need to introduce a time-decay function to overlook parts of the

prefix tree if changes to these sections no longer occur overnight and thus, prioritize recent habits

over past ones in an online-learning manner, as well as working towards automatically

determining the change-point.

Kirubavathi Venkatesh and Anitha Nadarajan [62] have detected the Spyeye and Zeus Botnet

with the aid of an adaptive learning rate multilayer feed-forward neural network. Here in this

work, various classifiers such as Decision tree, Random forest, and radial basis function are

discussed and are compared with the actively learn neural network.

Kamal deep Singh et al. [63] researchers expand on the success of open-source frameworks such

as Hadoop, Hive, and Mahout to provide a scalable implementation of a quasi-real-time intrusion

detection method. It built a random forest-based decision tree model to solve the problem of

Botnet detection in a peer-to-peer network. Though the technique served well to detect Botnet, it

17

failed to detect Botnet under low-frequency communication when a certain threshold exceeded.

Yair Meidan et al. [64] proposed a novel network-based anomaly discovery method that

abstracted behavior snapshots of the network and utilized deep autoencoders to notice anomalous

network traffic proceeding from compromised IoT devices. It relied on deep autoencoders for

every device, trained on statistical features pulled out from benign traffic data. When applied to

new (possibly infected) data of an IoT device, noticed anomalies might indicate that the device

was compromised.

McDermott et al. [65] used the novel presentation of in-depth practice to develop a detection

model based on the Bidirectional Short-term Memory-Based Recurrent Neural Network

(BLSTM-RNN). Word entry is used to convert text recognition and attack packets to a token

integer format. The advanced BLSTM-RNN detection model is related to LSTM-RNN to

identify the four attack vectors used by Miracle Botnet and estimated for accuracy and damage.

Yair Meadon et al. [66] presented a machine learning algorithm on network traffic data to

accurately identify IoT devices connected to the network. It collects and labels nine different IoT

devices and network traffic data from computers and smartphone to train and predict

classification. Utilizing supervised learning, it has trained multi-stage meta-categorization. In the

first step, the classifier can distinguish between traffic generated by IoT and non-IoT devices. In

the second step, each IoT device is associated with a specific IoT device class.

Sajad Homayoun and others. [67] Proposed an intensive learning-based Botnet traffic analyzer

called the Botnet traffic shark (BotShark). The BotShark network only uses transactions and

lacks deep packet testing methods. Thus, avoiding overcompensation limits such as not being

able to deal with encrypted payloads. This allowed the proposed system to detect interactions

between core features and introduce new features in a cascading manner in each layer of the

autoencoder or interactive neural networks. Additionally, they used the softmax classifier to

track malicious traffic effectively.

Farouk Shaikh. [68] presented a model for classifying unwanted IoT devices in organizations

through machine learning (ML). The IP header was extracted from the apparent darknet data for

data analysis. Consider several supervised ML algorithms for classifying these layer three

18

headers. Finally, these algorithms relate to their performance in detecting the occurrence of

malicious IoT devices on the Internet. The results showed that random forest and gradient uptake

had higher recall and accuracy scores, while naive bayes experienced worse performance.

Kamal deep Singh and others [69] used large-scale scalable architecture based on random forest

modeling and free software Hadoop, hive, and mahout to identify P2P Botnet channels in

networks. Big data technology is used here to deliver enormous amounts of flow data in an

acceptable time. However, the false-positive rate is not enough.

Yair Meidan et al. [70] deployed N-BaIoT, a network-based anomaly detection technique. It

makes use of an autoencoder to discover anomalous network traffic originated from

compromised IoT devices. Yet, the work failed to satisfy the security policies due to its minor

traffic predictability nature that causes difficulties in attack detection.

Moitrayee Chatterjee et al. [71] use evidence theory-based techniques for malicious Bot

detection with the help of a probabilistic reasoning tool called Dempster Shafer Theory (DST).

The vital characteristic of DST is that the detection system doesn’t require any prior information

about the malicious signatures and profiles. However, it exhibits low accuracy and a chance for a

higher error rate.

Bhansinger et al. [72] proposed a scalable model that could be used to locate Botnet in P2P

networks. The proposed system treats network traffic as a data stream, separating traffic into two

parallel streams. The identity focuses on a network failure, contact traffic, and traffic rate. Traffic

is analyzed in a short-term window, and infected hosts are notified immediately. It detects peer-

to-peer (P2P) Botnet in the network and detects bots using a failure-based algorithm.

Mohammed S. Gadelrab et al. [73] deployed BotCap, a Botnet detection technique that uses

machine learning concepts. They have used ML algorithms with a set of statistical features

extracted per trace to detect individual Botnet. The drawback met by this detection system is that

it couldn’t detect and tackle the new generations of Botnet.

19

Christian Hammer Schmidt et al. [74] proposed a method for collecting relevant data (small

amount) limited to real-time learning of complex models, a class of finite state machines. Such

devices are used as fingerprint interaction profiles, which identify or classify hosts and services,

provide less practice and process higher identification rates, faster than conventional models.

Methods that help identify Botnet with batch settings have caused memory problems over time.

Samuel Marchal et al. [75] deployed network data in a stream to overwhelm these issues. When

evaluating their approach, they achieved high host identification rates. However, their solution

does not identify malicious Botnet flows, requiring an increased number of flows to perform the

detection task.

Sidra Ijaz et al. [76] proposed a genetic algorithm-based solution to detect malware attacks. It

examined the overall performance of the detection system detecting the attacks exposed in the

KDD dataset and scenario 2 of the CTU 13 dataset. They exhibit satisfactory performance, yet

they lack the online capability, requiring several rounds of optimization and batch data for the

genetic algorithm.

Weikeng Chen et al. [77] introduced a standard Botnet-based profiling framework using three

unsupervised flow-based learning algorithms, including self-organizing maps, local outliers, and

K-NN outliers evaluated three unsupervised machine learning algorithms, Self-Organizing Map

(SOM), and local outlier factor (LOF), and k-NN outlier, to build a normal behavior profile to be

used for Botnet detection. Moreover, plan to have more robust and adaptive functions to

calculate the decision boundary based on the overall distribution of normal behaviors.

S. Garcia et al. [78] compared the results of three different Botnet detection methods by

implementing a new, real, labeled, and large Botnet dataset. This data set includes Botnet,

general traffic, and background traffic.Our two methods (Bacillus and Kamnep) and the

BoTunter results were compared using a methodology developed for Botnet detection methods

and a novel error metric. Much like a large and real dataset, it has shown us that it is easy to

detect the difficulty of working with methods and unfamiliar background data at any stage of

Botnet behavior, even if there are no significant numbers of separate Botnet.

Jing Wang et al. [79] make use of a two-phase approach for Botnet detection. This method

detects and collects the network anomalies, which was then followed by bots identification. The

20

detection phase quantifies and monitors flow-level data as histograms, which are then used to

construct graphs of highly interactive nodes.

Christos Tagarkakis et al. [80] introduced an IoT Botnet attack detection method that relies on

the sparsity representation model using the reconstruction error throttling rule to detect malicious

IoT network traffic from a reconstructed IoT device. Botnet attack detection is based on small,

harmless IoT network traffic information, and therefore, we have no prior knowledge of

malicious IoT traffic data. There is a need to analyze the proposed approach further using more

IoT Botnet attack datasets and establish a broader relationship with existing IoT Botnet attack

detection methods.

Stephen Herwig et al. [81] introduced a new Botnet, known as Hajime, targeting many of the

same devices as Mirai but differ considerably in design and operation. Hajime uses the public

peer-to-peer system as its command and control infrastructure and regularly introduces new

exploits. Thus, increasing its resilience. Unfortunately, none of these approaches successfully

stop Hajime's C&C without compromising Bit Torrent’s DHT quality.

RIoT demonstrated the first attack on the Internet of Things (IoT) devices used by Mevlot Turk

Garip and others [82] showed that vehicle Botnet was a threat to VANETs and other complex

systems and networks. So IoT devices can be threatened if the vehicle is not protected against

Botnet.

Mahesh Banerjee and others [83] analyze network traffic by identifying the network, detecting

the presence of a Botnet on the network using network flow and classification techniques, and

having a significant impact on traffic-related traffic filtering. Local honeynets are deployed for

implementation. For the latter, other types of data captured by honeypot, malicious binaries,

attack replays, etc., can be considered when studying and identifying Botnet.

Hui-Trung et al. [84] proposed a method that combines intensive learning and machine learning

to create a new feature-based PSI-root sub-graph for cross-architecture IoT Botnet malware

detection. This function is strong enough for various general machine learning classifiers to

achieve approximately 97% accuracy and an F-score of 98%. However, it combines a multi-class

approach and a more straightfoward approach to improving performance.

21

Joao Marcelo Ceron et al. [85] presented an approach to managing network traffic generated by

IoT malware in an analysis environment. The proposed solution may modify the network layer

traffic based on the actions performed by the malware. An analyst can quickly implement

separate setup configurations to determine malware characteristics and develop signatures in our

approach. However, there is a need to examine the actions of other IoT Botnets and provide the

signatures associated with them in the public repository.

Lihua Yin et al. [86] proposed a ConnSpoiler, a lightweight program that quickly detects the

flow of Algorithm-Generated Domains (AGDs) into IoT-based Botnet. Only low system

resources should be adequate for the conspirator and may work well on resource-restricted IoT

devices. Furthermore, the ConnSpoiler should only take the effective domain, and therefore, no

additional effort is required to label malicious samples for the training phase. They test the

consoler based on real-world DNS traffic from two different major ISP networks, proving that it

distinguishes mainly infected devices from unknown Botnet.

Nicholas Coroniotis et al. [87] introduced the new dataset, Bot-IoT, with real and simulated IoT

network traffic with various attacks. They also demonstrate virtual testing environments to

address the current dataset vulnerability of capturing complete network information, accurate

labeling, and recent and complex attack variation. This work provides the basis for initiating

Botnet detection on IoT-specific networks. However, a network forensic model needs to be

developed using in-depth practice and the Bot-IoT dataset to assess its reliability.

Muhammad Junaid Farooq et al. [88] suggested an empirical framework for analyzing the spread

of D2D malware on wireless networks. Using techniques from dynamic population processes

and point system theory, they capture the entry and integration of malware through the network

topology. Therefore, as part of future work, the proposed model should be used as a basis for

developing the game-theorem framework, which will allow us to develop appropriate approaches

for both the attacker and the defender.

Reem Alhajri et al. [89] focus on machine learning techniques to detect security threats on the

Internet. This aims to explore the feasibility of using auto-encoders to detect IoT Botnet. The

Botnet can develop DDoS attacks and present a significant security concern in IoT networks, as

no single method has shown the potential to address this security threat. However, it is not

22

aggregating the desirable features of the auto-encoder and mapping the safety requirements for

the Botnet detection system.

Georgios Spatholas et al. [90] proposed the use of mild agents installed in multiple Internet of

Things (IoT) installations (e.g., smart home) to detect service rejection attacks by Botnet devices.

Although still very open, our experiments suggest that it is possible to detect problems with

implementing of such systems or the underlying consensus policies, large-scale DDoS attacks.

Christopher d. et al. [91] analyzes the user needs of IoT devices and their importance on security

and privacy. They used experimental framework to determine users’ ability to identify threats in

the light of technology and experience. The limitations of this study are especially the self-

reporting nature of online surveys and the use of a single sample of malware. The study reflects

the broad cross-section of the user background with other types of malware and IoT devices.

Ruchi Vishwakarma et al. [92] demonstrated a honeypot-based approach that uses machine

learning methods to detect malware. The data machine-generated by IoT honeypot is used as a

dataset for effective and dynamic training of learning models. However, this approach needs to

be implemented in the next step, where real problems or concerns can be defined by applying

them in real-time scenarios. It has the ability to use cloud servers to manage highly resource-

restricted IoT phones.

Mingyang Yin et al. [93] suggested a non-markovian spread dynamics model that could describe

Botnet spread e-projects as a state of hybrid infection. Based on the suspended-received-

recovered method, they have implemented an unnecessary memory diffusion approach for global

spread as a tuner to change the diffusion rate of the scatter. With the role of memory, this method

can support different spreadsheets by introducing a hybrid propagation approach and scope

controllers. Still it simplifies the life conditions and nodes of the bot. Immune symptoms are not

taken into account.

Vitor Hugo Bezerra et al. [94] suggested a host-based approach to detect Botnet through IoT

phones, called IoTDS (Internet of Things Detection System). This system relies on single-class

classifiers, which model only valid system actions to detect of deviations further, avoiding the

manual labeling process. Such experiments have shown that the solution improves the CPU,

23

memory, and energy use of the computer, but we have not found any issues with the operation of

the device.

MortezaSafai Pour et al. [95] search for macro, passive empirical data to shed light on this

emerging threat phenomenon. By looking at this one-way network traffic, it attempts to identify

and terminate compromised IoT devices on the Internet and detects, monitors, and reports

coordinated IoT Botnet. However, we will try to overcome some of the shortcomings of current

research, for example, misidentifying two different IoT Botnets, which may have improved

labeling systems and may have the same feature.

Yan Naung Soe et al. [96] explain that the Botnet attacks are the most recent attack on the IoT

environment. It is needed to protect the IoT devices from these kinds of attacks. However, there

are challenging to implement the attack detection system on IoT devices because they have

minimal resources. Although anomaly-detection architecture has unknown attack detection

capability, it isn’t easy to get an effective system for all devices because of the different

architecture of IoT devices.

Nicholas Coroniotis et al. [97] reviewed the forensic and intensive learning mechanisms used to

examine Botnet and present them in IoT environments. In addition to the classification of

network forensic solutions developed for traditional IoT environments, they provide a new

definition of IoT. However, the lack of development and improvement of honeypot and network

flow analysis, dealing with the high speed and a large amount of data generated by IoT and any

solutions developed that sounded forensic and the results produced were acceptable to the court.

Qaisar Shaf et al. [98] focus on the design of an Internet of Things (IoT) Botnet prevention

program which supports both Software-Defined Networking (SDN) and Distributed Blockchain

(DBC). IoT communication, the in-band channel is extended by Generic Routing Encapsulation

(GRE) tunnels between network switches running inside each mininet instance of a single VM.

Thomas Lange and Houssain Kettani [99] summarized Botnet evolution, patterns, and

mitigation. They provided relevant examples and analysis to provide the reader with quick access

to a broad understanding of the issues at hand. It is, therefore, well suited for situations involving

Botnet where the processing of data sets is a problem and concatenation is not advisable.

R. K. Malaiya et al. [100] proposed and empirically proposed a new network-based anomaly

24

detection method that captures behavioral snapshots from the network and detects network traffic

from compromised IoT devices. Depending on this, its general behavior becomes more

challenging to comprehend, and thus, subsequent conclusions may be subject to other taxonomic

errors.

Scope of work and Objectives

Following extracts from the research in this domain, it has been identified that the current

methods that deal with Botnet detection work lead to time and memory constraints. In this sense,

previously proposed approaches deal with network data in a stream setting. When evaluating

their approach, they achieved high host identification rates. However, their solutions do not suite

with an approach that identify malicious Botnet flows and requires a high number of flows to

perform the detection task.

Based on related research surveyed, Botnet detection approaches can be host-based or network-

based. In Constraint devices, we can’t dump our Botnet or Anomaly detector. Host-based Botnet

or anomaly detector algorithms consume device efficiency and power of IoT Devices. The

proposed approach can be used as a host-based or network-based Botnet or Anomaly detector

suitable for organizations for a single non-distributed network.

Based on related research surveyed, it is concluded that further improvement is required for the

detection process of Botnet in the IoT-based network. The IoT-based networks need

improvement in detecting and removing Botnet attacks with high accuracy. Prior systems fail

because of its poor traffic predictability, which is experienced in the literature survey. In

addition, it creates memory and time complexities.

Hence it is essential to develop novel frameworks for Botnet detection with improved prediction

accuracy.

The Main objective of this thesis is to detect and cluster attacking nodes to help in the mass

removal of Botnet attacks. In the proposed work, three frameworks experimented with the real

testbed.

1. Mass Removal of Botnet Attacks using Heterogeneous Ensemble Stacking PROSIMA

Classifier in IoT.

2. Isolating Botnet Attacks Using Bootstrap Aggregation Suflex-PSIM Classifier in IoT.

3. A Novel Forecastive Anomaly Based Botnet Revelation Framework for Competing Concerns

in Internet of Things

25

CHAPTER 3

Mass Removal of Botnet Attacks Using Heterogeneous

Ensemble Stacking PROSIMA Classifier in IoT: First

Approach

3.1 Heterogeneous Ensemble stacking PROSIMA classifier

In the IoT environment, the Botnet attack is carried out by compromised nodes, so it is

challenging to detect Botnet compromised nodes. In the proposed approach data is collected

from the different sensor nodes, and unwanted data is removed during the preprocessing stage.

The preprocessed data are used for training in heterogeneous ensemble stacking classifiers. In

phase two of the proposed classifier, again, a random forest algorithm is used as a meta-

classifier. In the testing phase, a similar Botnet would be clustered by the PROSIMA protein

sequence similarity algorithm. Figure 3.1 shows the proposed heterogeneous ensemble stacking

PROSIMA classifier. Figure 3.2 shows the overall structure of the network scenario under

consideration. In this proposed approach, the classifier is used at the gateway, but it can use in

the node if the node is capable enough to carry a pre-trained model of the classifier.

FIGURE 3.1 Heterogeneous Ensemble stacking PROSIMA classifier

26

FIGURE 3.2 Overall Architecture of the IoT network with a Botnet attack

3.2 Data Collection

In the experimental setup, each IoT node is connected with a sensor node Sn. Data are generated

at every sensor node Sn= (S0, S1,…, Sn). The collections of resources are identified as R= {IoT1,

IoT2, IoT3,…, IoTn}. Since data is collected from the sensor nodes, it includes raw data along

with network traffic. Data preprocessing is required to remove unwanted data and redundant

information.

27

3.3 Preprocessing

The data packets arrived are captured with the help of Wireshark in the form of a ‘pcap’ file.

With the help of the Tshark command, the ‘pcap’ file is converted to a CSV file. The features

required to analyze the packets for the response of Botnet attack are derived from CSV files. To

detect a Botnet, only network traffic information would be required. So it is necessary to remove

unwanted information like sensor data. Sensors’ data are removed, and only network traffic flow

is retained in the feature set. Total 21 features like packet arrival time, source address,

destination address, transport layer protocol, packet length, etc., are collected with 2, 25,745

instances. Of these 1, 83,910 instances belong to no attack class and 41,835 instances belong to

DDoS and Spam Botnet class. XGBoost, Adaboost, and Random forest machine learning

algorithms were used to avoid value scaling.

3.4 Feature Selection

Feature selection finding the most relevant features from the available feature set for a classifier

model. These techniques are accustomed to establishing and taking away needless, tangential,

and redundant options that don't contribute to or decrease the model’s accuracy. The most

powerful technique would be a genetic algorithm. After the preprocessing stage, a genetic

algorithm is used to identify relevant features selection to visualize the data and reduce

processing time further for the classification stage. The first step is to form and initialize the

individual within the population. Because the genetic algorithmic program may be a random

improvement technique, the genes of the people area unit are sometimes initialized haphazardly.

The second stage would be assigned fitness value to each individual. The model is trained with

the entire training dataset to evaluate the fitness. Fitness values would be assigned by rank-based

method.

The fitness value is assigned to individuals using rank based method as following:

∅�� = � ∗ �� = 1, … , (1)

Here k is constant and also called selective pressure. Its value is fixed between 1 and 2. In the

proposed work this value is selected to 1 as per the literature of the Genetic Algorithm. Greater

selective pressure values can create the fittest individual to own a lot of chances of

recombination. The parameter R (i) is the rank of individual ‘i’.

28

�� = ��∗�� (2)

Once the fitness assignment is performed, the choice operator chooses the individual that may

recombine for the following generation. Therefore, the selection operator selects the individual in

step with a fitness level for the next crossover. Next, the GA can determine how bits are swapped

among the tries. After receiving the fitness value, feature selection is performed using Mod-

Dejong on our dataset. Mod-Dejong gives 4 features, which would be utilized to train the

proposed algorithm.

3.5 Proposed Model

3.5.1 Popular ways to combine different classifiers

There are classifiers which are showing results to identify the presence of Botnet attack with

different methodologies. Popular approaches in which different classifiers can be combined are

bagging, boosting, and voting. This is also referred to as ensemble learning. Bagging, boosting,

and voting would be the popular way of combining totally different classifiers and training them

on a random subset of the data called ensemble learning [6]. One of the examples of bagging is

the random forest. Boosting which is very similar to bagging but here in bagging previous bag

errors is taken into consideration. One of the examples of boosting is adaboost. Bagging is better

than boosting. Boosting can lead to overfitting in the classifier where the model works better on

the training data set but fails to detect the attack on unknown data. There are two main

techniques to combine the model, voting and stacking. In voting, the class is predicted as a

majority vote from the different classifiers. The stacking classifier is discussed in the next

section.

3.5.2 Popular ways to combine different classifiers

The main advantages of using stacking classifiers are the products of the base-level classification

field unit accustomed to the meta classification train. The goal of this next level is to determine

the learning process. For example, if the taxonomy constantly loses field due to misinterpretation

of the feature area of that area, the meta-classifier may be ready to identify this negative aspect.

It improves learning errors by highlighting the learned behaviors of alternative classifiers.

Stacking is the process of combining different classifiers CL1, CL2, ..., CLn on the single

dataset. It is a two steps process. In the first step, a set of base classifiers BC1, BC2,…, BCn is

29

used. In the second step, a meta-classifier is used which performs predictions on a newly

constructed dataset.

3.5.3 Overall Architecture of Heterogeneous Ensemble Stacking Meta-classifier

In the proposed heterogeneous ensemble stacking meta-classifier, XGboost, AdaBoost, and

random forest heterogeneous classifiers are used. Again random forest classifier is used as a

meta-level classifier. During the testing phase, similar Botnets are clustered using the PROSIMA

algorithm.

3.5.3.1 XGBoost (Extreme Gradient Boosting) Algorithm

XGBoost is an associate algorithmic program that has recently been identified as dominating

applied machine learning and Kaggle competitions for generating structured or tabular

information. XGBoost is an associated implementation of gradient boosted call trees designed

for achieving higher amounts of speed and performance simultaneously. The sweetness of this

powerful algorithmic program lies in its measurability that drives quick learning through parallel

and distributed computing and offers economical memory usage.

3.5.3.2 Adaboost (Adaptive Boosting) Algorithm

Adaboost is a preferred algorithm to boost the performance of call trees on binary classification

issues. It is stated as a distinct AdaBoost, a result of its use for classification instead of

regression. It is best used with weak learners.

Algorithm: Stacking Classifier

1: Input: Training data � = ��, ��

2: output: ensemble classifier E

3: Step 1: Learn base-level classifiers

4: for t=1 to T do

5: learn ht based on D

6: end for

7: Step 2: construct new data set of predictions

8: for i = 1 to m do

9: Dh = {xi, yi} where xi’ = {h1 (xi)... hT (xi)}

10: end for

11: Step 3: learn a meta-classifier

12: learn E based on Dh

13. Return E

30

3.5.3.3 Random cluster sampling forest Algorithm

Random forest builds multiple decision trees and merges them along to induce an additional

correct and stable prediction. Here is the algorithmic rule for random forest algorithms. Due to

its performance and accuracy, the random forest is used as a base classifier as a meta-classifier.

The conventional random forest takes less time to train but more time for predictions because

large numbers of trees would slow down the algorithm’s performance. So cluster sampling is

adopted in place of random sampling in the meta-classifier stage to speed up the prediction

process. Figure 3.3 shows the training process of the random clustering forest.

Prediction of the unseen sample using random forest is defined as:

∑−

=T

t

t sEFT

F

1

' ))((1

(3)

F’ indicates the prediction of all the unseen samples and Ft indicates the time period for

observation. E(s) represents the Poisson distribution of the trained data set which reduces the

time of training. The bagging process repeatedly (T times) selected the random sample from the

training dataset.

The primary significance of this (random forest) model is that instead of finding the simplest

feature when it is a half hub, it randomly scans the simplest feature in associates in the nursing

set of random features. This makes the process the best model. Figure 3.4 shows the process flow

of the heterogeneous ensemble stacking meta-classifier. Figure 3.5 shows the flow of the

proposed system. Figure 3.6 shows the clustering of Botnet using PROSIMA.

31

FIGURE 3.3 Training process with random clustering sampling forest Algorithm

Begin

For each tree T

Chose training data Subset

Check

condition

at node?

Apply Poisson

distribution

Build the next split

Calculate prediction

error

End

32

FIGURE 3.4 Process flow of heterogeneous ensemble stacking meta classifier

3.6 Mass clustering based on PROSIMA protein similarity

All the similar Botnet having repetitive structures are clustered by the PROSIMA protein

similarity algorithm. The output of the training phase eq (3) is clustered in the testing phase.

We use m different terms t1, t2…..tm for indexing N features. Then each observation Oi is

represented by a vector:

�� = ��1, ��2, ��3,… , �� (4)

where Oij is the weight of the term tj in the observation di.

An index file of the vector model is represented by matrix:

D= �11012 …�1"�21�22…�2" ⋮⋮⋱⋮ � 1� 2…�%"& (5)

where ith

row matches ith

observation and jth

columns matches’ jth

term. The similarity of two

observations is given by the following formula,

'�"��, �(� = ∑ �*��*+��,-./0∑ �*��1,-./ 0∑ �*+��1,-./

(6)

1. Input generalized suffix tree data structure from meta-level classifier

2. Find all maximal substructure clusters within the suffix tree.

3. Build a vector model of all pockets in our assortment

4. Build pocket similarity matrix

33

FIGURE 3.5 Process flow of the proposed system

FIGURE 3.6 Flow diagrams for clustering of Botnet using PROSIMA

List the devices with

their activities

Compute similarity matrix

Set device as a cluster

Number

cluster =1

Update a similarity matrix

Merge two similar devices

End

Yes

NoNo

Begin

34

3.7 Experimental Setup

In experimentation, two kinds of attacks are considered. The DDoS attack and spam attack,

DDoS attack may be a digital attack during which the attacker tries to make a machine or system

inaccessible by incidentally or inconclusively distressful administrations of a bunch related to the

Internet.

Email spam contains unsolicited messages, often by random business entities. Spam can be a real

security issue to expose trojan stallions, infections, worms, spyware, and targeted phishing

attacks. In a normal attack, a single attacker tries to disrupt the network. In Botnet attack, the

number of malicious nodes called bots attempts to attack the target system as each connected

node is affected.

The proposed method is evaluated with the experimental setup. The traffic is collected from 20

IoT real nodes (implemented with Raspberry pi 3) connected via the Wi-Fi network to the access

point and wired connection to the central switch and the router. Using Tshark and Wireshark the

network traffic is sniffed, port mirroring on the switch has been utilized for sniffing. C & C

server has been achieved using a python script to send the file and control IoT devices. Three IoT

devices are configured as bots to generate DDoS and spam attacks to the rest of the devices in

the network. Twenty-one features have been extracted from 5-time windows each of 1.5 ms, 10

ms, 50 ms, 100 ms, and 500 ms, respectively. Using python script and Tshark commands, packet

delivery ratio, packet loss, and throughput, packet arrival time is computed as the number of

received/sent packets. Arrival time is computed as described in 3.7.2. Figure 3.7 shows the

experimental setup for detecting Botnet attacks.

In the proposed work, IoT devices are infected using created DDoS and Spam attacks. To send

the DDoS and Spam attacker script on the IoT devices (Raspberry pi 3), brute-forcing is carried

out on the Telnet port. Required python scripts are created using python Scapy. Under the

influence of attack, the IoT devices started generating DDoS and Spam attacks for the rest of the

devices available in the network. The result of one such experiment is shown in table 3.2. The

traffic data collected for the experimental setup has been further utilized to evaluate the proposed

classifier’s performance evaluation.

35

FIGURE 3.7 Experimental setup for detecting Botnet attacks

When the number of devices in the IoT ecosystem increases due to its technical complexity, the

current systems have skipped more of the Botnet. This type of attack is, therefore, very complex

and challenging to identify. A heterogeneous ensemble stacking PROSIMA classifier is proposed

to identify a Botnet attack, which takes advantage of cluster sampling instead of traditional

random sampling to make predictions more accurate. Thus, this technique achieves more

reliability of the IoT-based network over Distributed Denial of Service (DDOS) and spam

Botnet. In the Botnet attack group, smart objects would come together and execute an action that

would lead to the destruction of the IoT-based system, so early elimination of the Botnet would

help maintain the network’s security.

3.8 Results and Discussion

The proposed Isolating Botnet attacking using heterogonous ensemble PROSIMA classifier is

implemented in Anaconda’s spyder software using python version 3.6. Python is the most

powerful scripting language developed by Guido Van Rossum in 1989 in the Netherlands, but it

has gained momentum in the last decade. The main advantage of using Python is that Python’s

standard open-source libraries are enormous, and you can find almost all the functions needed

for your task. T Python’s machine learning libraries like NumPy, Pandas, Matplotlib, and

Sklearn. Python’s Tkinter library is used to create GUI.

The proposed system for IoT based network is implemented using the python programming

language. Figure 3.8 shows the IoT based network implemented using python.

FIGURE 3.8

3.8.1 Calculation of Packet arrival time

Since smart objects are involved in the IoT

unauthorized users can quickly access

false information that affects the working of the IoT node. In

out malicious activities by forming

and inter-arrival time of each smart object which is involved in the IoT based network,

suspicious users can be listed and the monit

Let be a process with rate

So ~ Exponential , Let

And let and

=

36

roposed system for IoT based network is implemented using the python programming

shows the IoT based network implemented using python.

FIGURE 3.8 The IoT based network

Calculation of Packet arrival time

Since smart objects are involved in the IoT-based network, in the absence of security

access the network and IoT node resources and

false information that affects the working of the IoT node. In a Botnet attack, smart objects carry

malicious activities by forming groups among each other, so by keeping track

arrival time of each smart object which is involved in the IoT based network,

suspicious users can be listed and the monitoring process would be executed on those users.

. Let be the time of the first arrival, then

(1)

be the time interval between the first and second arrival

, two intervals are independent

(2)

roposed system for IoT based network is implemented using the python programming

absence of security,

the network and IoT node resources and hence distribute

ck, smart objects carry

keeping track of the arrival

arrival time of each smart object which is involved in the IoT based network,

oring process would be executed on those users.

be the time interval between the first and second arrival

are independent

If be a process with rate , then the inter

~ Exponential , for i=1, 2…

is the sum of independent exponential

The Probability Density Function of

(3)

If Exponential , then

Since it is concluded that the arrival time of the Poisson distribution is

calculated by

(4)

In IoT based network, the arrival

arrival time.

Indicates the mean arrival time of each user in the Io

37

, then the inter-arrival times are independent

, for i=1, 2…

independent exponential random variables then:

Probability Density Function of for n=1, 2, 3…

it is concluded that the arrival time of the Poisson distribution is

(4)

(5)

the arrival time of each smart object would be calculated based on the

Indicates the mean arrival time of each user in the IoT based network.

are independent

it is concluded that the arrival time of the Poisson distribution is

(5)

time of each smart object would be calculated based on the

3.8.2 Packet Delivery Ratio (PDR)

The estimate of the Packet Delivery R

packets (Size of 1 packet is 40 bytes)

classified as the ratio between the

bundles produced by the source. Figure 3.9

intervals.

FIGURE

3.8.3 Packet Loss

Packet loss occurs after at least one packet of the network fails to reach its target. Packet loss is

calculated as the range of the packet lost

packet loss ratio over normal and attack duration.

FIGURE 3.10 Network

38

Packet Delivery Ratio (PDR)

The estimate of the Packet Delivery Ratio (PDR) is based on the number of

(Size of 1 packet is 40 bytes)) recorded in the trace document. Overall, the PDR is

classified as the ratio between the number of bundles received by the target and the number of

produced by the source. Figure 3.9 shows the packet delivery ratio for normal and attack

FIGURE 3.9 PDR during normal and attack period


calculated as the range of the packet lost in terms of the packet. Figure 3.10 shows the network's

packet loss ratio over normal and attack duration.

etwork packet loss ratio over normal and attack duration

number of bundles (set of

in the trace document. Overall, the PDR is

ed by the target and the number of

shows the packet delivery ratio for normal and attack


shows the network's

over normal and attack duration

3.8.4 Throughput

In data transmission, information is transferred from the supply node to the destination during

the throughput nominal period and is usually m

network's throughput over normal and attack duration.

FIGURE 3.11 Network

3.8.5 Clustering of Botnet of DDoS and

In a DDoS type Botnet attack, the attacker sends a request for a resource to a specific destination

address for a while so that users cannot authentic

proposed classification packet maintains cluster

destination address, and the requested resource. A

source nodes and the destination nodes to speed up the attacks.

Spam Botnet sends the email to the spam box instead o

application. This includes giving out unwanted messages. Spam tram stallions ar

focusing on infections, worms, spyware, and phishing attacks.

In existing methods, hierarchical clusters and K

of hierarchical groups is that if the two groups are together, it cannot be deferred

means is necessary to find the k

in the proposed system composite model, which is similarity

39


the throughput nominal period and is usually measured in bits per second. Figure 3.11

network's throughput over normal and attack duration.

Network throughputs over normal and attack duration

ering of Botnet of DDoS and Spam types


address for a while so that users cannot authenticate that resource for a particular

proposed classification packet maintains cluster nodes that support allocating

the requested resource. Also, it calculates the distance between the

source nodes and the destination nodes to speed up the attacks.

to the spam box instead of sending it to the inbox of the mail

application. This includes giving out unwanted messages. Spam tram stallions ar

spyware, and phishing attacks.

In existing methods, hierarchical clusters and K-means clustering were used. The main drawback

of hierarchical groups is that if the two groups are together, it cannot be deferred

necessary to find the k-values before the algorithm is implemented. Clustering is used

composite model, which is similarity-based clustering with a higher


. Figure 3.11 shows the

over normal and attack duration


ate that resource for a particular period. The

at support allocating sent time, a

calculates the distance between the

f sending it to the inbox of the mail

application. This includes giving out unwanted messages. Spam tram stallions are a real security

clustering were used. The main drawback

of hierarchical groups is that if the two groups are together, it cannot be deferred, and the k-

before the algorithm is implemented. Clustering is used

based clustering with a higher

40

clustering ratio than the current system. Mixed model clustering can handle how many cluster

shapes. Figure 3.2 shows the clustering of Botnet attacks leading to a DDoS attack and a Spam.

TABLE 3.1 Log details of each smart node in the network

Node IP Address Arrival time

(Sec)

Packet Delivery

ratio

Packet Loss

(Kbps)

Throughput

(Kbps)

n1 151.142.255.1 2.256 88.025 2.2835 56.895

n2 151.142.255.2 1.267 93.211 1.4756 54.742

n3 151.142.255.3 8.278 94.723 1.8629 55.315

n4 151.142.255.4 1.289 94.601 1.8687 56.889

n5 151.142.255.5 1.314 94.783 1.4756 53.895

n6 151.142.255.6 5.311 89.5404 1.8629 56.888

n7 151.142.255.7 4.322 93.031 2.1905 53.895

n8 151.142.255.8 3.333 95.5216 2.1905 51.2

n9 151.142.255.9 2.344 88.0122 1.4756 60.235

n10 151.142.255.10 1.355 90.5028 2.1905 53.895

n11 151.142.255.11 1.366 92.9934 1.8629 51.2

n12 151.142.255.12 9.377 95.484 2.1905 51.2

n13 151.142.255.13 8.388 87.9746 2.2835 51.2

n14 151.142.255.14 7.399 91.4652 1.9597 51.895

n15 151.142.255.15 6.441 92.9558 1.8629 50.96

n16 151.142.255.16 5.421 89.937 1.8629 53.895

n17 151.142.255.17 4.432 90.4276 1.57717 56.889

n18 151.142.255.18 3.443 92.9182 1.9598 56.888

n19 151.142.255.19 2.454 95.4088 1.8629 51.221

n20 151.142.255.20 1.465 89.8994 1.9598 56.38

41

TABLE 3.2 DDoS and SPAM Botnet attack clustered Nodes

Node Source IP

Address

Packet Sending

Time (Sec)

Destination IP

Address

Resource

n1 151.142.255.1 0.214 151.142.250.11 file-1

n2 151.142.255.2 0.214 151.142.250.11 file-1

n3 151.142.255.3 0.214 151.142.250.11 file-1

n4 151.142.255.4 0.214 151.142.250.11 file-1

n5 151.142.255.5 0.214 151.142.250.11 file-1

n6 151.142.255.6 0.214 151.142.250.11 file-1

n7 151.142.255.7 0.214 151.142.250.11 file-1

n8 151.142.255.8 0.114 151.142.255.11 mail

n9 151.142.255.8 0.114 151.142.255.11 mail

n10 151.142.255.10 0.114 151.142.255.11 mail

n11 151.142.255.11 0.214 151.142.250.11 file-1

n12 151.142.255.12 0.214 151.142.250.11 file-1

n13 151.142.255.13 0.214 151.142.250.11 file-1

FIGURE 3.12 Clustering of Botnet attack which leads to DDoS and Spam attacks

42

As shown in Figure 3.12 seven nodes are clustered under DDoS attack (pink colored) and three

nodes are clustered under Spam attack (red colored).

3.8.6 Comparing proposed classifier with existing classifiers

In this section, the proposed classifier is compared with existing classifiers in terms of different

parameters as shown in TABLE 3.3.

TABLE 3.3 (a) List of classifiers with the proposed system

Classifiers Precision Recall F-Measure Accuracy

IoTDS [30] 0.968 0.931 0.949 96.5333

BoTshark

[18]

0.968 0.934 0.95 96.667

Proposed 0.971 0.963 0.966 98.63

TABLE 3.3 (b) List of classifiers with the proposed system


Decision Tree 0.968 0.931 0.949 96.53

Random

Forest [29]

0.968 0.934 0.95 96.66

RBF 0.976 0.927 0.95 96.53

Proposed 0.971 0.963 0.966 98.63

43

Precision

Precision is revealed in the fraction of the test part of the data as the attack is literally from the

attack categories.

Figure 3.13 shows a comparison graph for precision for four types of classifiers.

FPTP

TPprecision

+=

Where TP represents the True Positive value, FP indicates the False Positive.

FIGURE 3.13 Comparison graph for Precision

The proposed classifier achieved an optimum precision value of 0.97. Comparatively, precision

value is better than existing classifiers since meta-classifier have adapted a cluster-based

sampling approach, which first finds similar elements, and then splitting is performed.

Recall

Recall measures the fraction of attack class that was correctly detected as Botnet.

Figure 3.14 shows a comparison graph for recall.

FNTP

TPcall

+=Re

Where TP represents the True Positive value, FP indicates the False Positive.

44

FIGURE 3.14 Comparison graph for Recall

The proposed classifier achieved a better recall value of 0.96 than other existing classifiers like

Decision tree, random forest, and RBF with precision values 0.93, 0.93, and 0.92. The proposed

system has utilized similarity-based clustering. So, it separates the event successfully.

F-Measure

F-measure can measure the test accuracy. It is a measurement of balance between precision and

recall. Figure 3.15 shows a comparison graph for F-measure.

RP

RPmeasureF

+=−

**2

Where P represents the precision and R denotes the recall value.

45

FIGURE 3.15 Comparison graph for F-measure

The proposed system has utilized cluster-based sampling in the training phase. It first clusters out

a similar event before performing splitting the observation for decision tree creation. So, it

achieved a better F-Measure compared to existing classifiers.

Accuracy

Accuracy is that the portion of predictions our model got right. Formally, accuracy can be

defined as,

TB

BIAAccuracy c=)(

Where IcB indicates the correctly identified Botnet attack, TB denotes the total number of Botnet

attack.

The proposed classifier has utilized top-class base classifiers at the first phase and a Meta

classifier with cluster-based sampling at the second stage. Then similar Botnet would be

clustered by PROSIMA based on equal pocket value.

So proposed classifier has qualified higher accuracy of 98.63 than existing classifiers Decision

Tree, Random forest, and RBF had 96.53, 96.66, and 96.53.

46

FIGURE 3.16 Comparison graph for Accuracy

The proposed classifier has utilized powerful base classifiers at the first phase and meta classifier

with cluster-based sampling at the second stage. Then similar Botnet would be clustered by

PROSIMA based on similar pocket value.

The proposed classifier has qualified higher accuracy of 98.63% compared to existing classifiers

decision tree, random forest and RBF had 96.53%, 96.66%, and 96.53%. In this first approach

time taken for prediction is quite high due to many classifiers have been used for classification

and also to improve classification accuracy second approach has proposed in chapter 4.

47

CHAPTER 4

Isolating Botnet Attack Using Bootstrap Aggregation

Surflex-PSIM Classifier in IoT:

Second Approach

4.1 Seclusion of Botnet attacks using PSIM based on random Poisson forest

model

Botnet attacks are carried out by a group of compromised nodes Thus, making it difficult to spot

out by the conventional methods. Henceforth, the proposed system has used a learning-based

classifier to trace and cluster the Botnet attack. Initially, since data stored is gathered from a

sensor network, it includes both linear and nonlinear data. In order to remove unwanted data,

effective preprocessing techniques are required. In the proposed system, Linear Random Euler

Complex-valued Filters (LRECF) which linearize the dataset by using Euler distance valued

filtering. Consequently, the preprocessed linearized data sets are trained by the random Poisson

forest algorithm which applies the general bootstrap aggregation technique, repeatedly selecting

a random sample with replacements of the training sets for a given time. Subsequently, based on

the trained data, the similar Botnets are clustered using Surflex-PSIM, which isolates the Botnet

attacks as clusters based on automatic trained characteristics of attacks. Even a large dataset

where subjected as input, yields accurate clustering. The timing for getting rid of individual

analysis of Botnet removal can be avoided such that accurate and less time-consuming Botnet

detection can be achieved.

4.2 Data gathering phase

In this IoT Based approach, data are gathered using sensor nodes S= (S1, S2,…, Sn). Then the

collected resources are defined as G= {IoT1, IoT2, IoT3,…, IoTn} since data is collected from the

sensor nodes, it includes raw data also. Data preprocessing techniques are used to remove

unwanted data. Since sensor nodes deliver real-time information, linear filtering is adapted to

preprocess the data.

48

FIGURE 4.1 Overall architecture of the IoT network with Botnet attacks

4.3 Removing complex-valued variable

Since the IoT environment is based on context-aware computing and also different activities

carried out by Botnet. Data sensed by sensor nodes have complex-valued variables. In order to

remove the non-linearized data, the whole dataset obtained are converted in the origin of the time

axis and there arises a non-linearity while converting to the time axis which rectified by using

Linear Random Euler Complex-valued Filter (LRECF), which linearizes the dataset by using

Euler distance valued filtering and prevent the features of Botnet from the exhaust.

49

A complex-valued variable C is defined as

IR CCC += (1)

Where )(GSC = , RC and IC are the real and imaginary parts of C and 1−=i is the

imaginary unity. The probability density function of complex valued random variable would be

defined by the joint probability density function of its real and imaginary parts respectively.

),()( IR CCpCp = (2)

The expectation of the complex-valued random variable is defined as

)()()( IR CiECECE += (3)

A random variable which is complex-valued would be said to be zero mean when the real and

imaginary parts are zero

0)()( == IR CECE (4)

FIGURE 4.2 Complex valued linear filtering

In filtering system, pair of samples su and

sv from S where )(cES = is given for training and a

set of errors is denoted by sss yve −= .Wheresy indicates the expected output. The cost

50

function used for filtering is defined as ))(( *ss eeE .Weight vectors of the learning system

would be updated based on the minimization of mean square error and the complex gradient

descent method. Initially the cost function would be 0. Then when the second variable enters the

learning model, cost function would be calculated based on MSE that is ))(( *ss eeE . Then the

weights gets updated using equation (5) similarly the cost function would be calculated for every

instance and weights (w) gets updated simultaneously. l represents current state of the complex

learning system.

))(()()1( ss uleElwlw η+=+ (5)

The probability density function of a random variable which is complex-valued is given as

),()( IR eePeP = (6)

The entropy of this error data which is complex-valued is defined as

(7)

From eqn (7) data with the least entropy error would be passed to the training phase. Data

(r1…..rn) with the least value of entropy error would be chosen.

FIGURE 4.3 Process flow of the proposed system

{ }),(log),()( IRIR eepEeeHeH −==

51

4.4 Bootstrap Aggregating Surflex-PSIM Classifier

The linear data obtained from the filter here is subject to the classification of the training phase

and the test phase. The training phase will be a random Poisson forest model with trees and

combining them to get an accurate estimate. The determination of each internal node represents a

test on the tree attribute. In the decision tree, each branch shows the test result. If the node does

not have children, that node is called the leaf node. Each leaf node in the decision tree displays

the class label. The main importance of this model is that instead of hunting for the best feature

during the hub, it scans for the best feature in random features. This process produces a good

variety, most likely a good sample. Traditional random forest algorithms take less time to train

but take longer to model, so the model slows down due to a large number of decision trees. To

speed up the entire process of random forest sampling, the Poisson distribution function is

utilized. The test phase consists of a PSIM, which automatically separates Botnet attacks into

training groups based on the surface properties of the pocket value.

FIGURE 4.4 Random Poisson forest classifier

4.4.1 Random training model based on Poisson distribution

Random Poisson forest counts the number of events and the time that these events occur in a

given time interval, so it achieved a better prediction rate during the training phase. The

algorithm for training applies to the general technique of bootstrap aggregating. Initially, the

linearized data set R=r1,……………r

s1,s2………………sn would be subjected to a bagging process which repea

sample with replacements of the training sets for a given time set by Poisson distribution. The

linear regression for the trained data set is defined as

nS

Where b indicates the regression coefficient,

input variable. In order to predict the unseen samples in the data set, Poisson distribution is applied

which speeds up the prediction process.

[ ])(log SE =

Where log (t) represents the offset variable since

represents the observed time period.

observing i events over the time period is defined as

Let be the expected value (average) of S and e denotes exponential

Then taking the average for all the predictions from an individual regression tree

Where F’ indicates the prediction of all the

observation. E(s) represents the

of training.

52

,……………rn, which is obtained from eq (8) with responses S=

would be subjected to a bagging process which repeatedly selects a random


linear regression for the trained data set is defined as

nniin rbrbrbb .................210 +++= (8)

indicates the regression coefficient, S represents the trained data set and


which speeds up the prediction process.

)log(.........22110 trbrbrbb nn ++++= (9)

ents the offset variable since Poisson regression uses fixed time and

represents the observed time period. If S follow a Poisson distribution then the probability of

observing i events over the time period is defined as

!)(

i

eiSp

λλ −==

(10)

(average) of S and e denotes exponential


∑−

=T

t

t sEFT

F1

' ))((1

(11)

Where F’ indicates the prediction of all the unseen samples, T indicates the time period for

E(s) represents the Poisson distribution of training data set which reduces the time

(8) with responses S=

tedly selects a random


represents the trained data set and r represents the


regression uses fixed time and t

distribution then the probability of


unseen samples, T indicates the time period for

distribution of training data set which reduces the time

53

4.5 Pseudo code for random Poisson forest

Input: Training sample S , classifier F , Iteration I

Output: '

F

Training: sets the weightage value m

iS Sample from S according to the Poisson distribution

yi Number of data samples

iFTrain a classifier iS

on via F

∑≠∈

=

iyiriii Fsr

ii rweightm

e

)(:

)(1

iiii rrweightrweight ∀= ,)()( β

iii yrF =)(

∑=

=

yxFii

i

F

)(:

' )1log(β

i

ii

e

e

−=

1β

54

FIGURE 4.5 Training process with random Poisson forest

Begin

For each tree T

Chose training data Subset

Check

condition

at node?

Apply Poisson

distribution

Build the next split

Calculate prediction

error

End

55

4.5.1 Mass clustering based on P-SIM clustering

After being trained, similar Botnet would be clustered using surflex-PSIM utilizing its repetitive

structure, which isolates the Botnet attacks as clusters based on automatic trained characteristics

pocket value.

The output of the training phase from eq (11) would be used for clustering in the testing phase

based on the similarity of Botnet behavior. The main idea behind this approach is to cluster the

similar type of Botnet among the authenticated smart objects which are involved in the IoT based

network. Below mentioned formulae are used to find the similarity of various Botnet.

( ) ( )vuvuvuvuvuVU MMCMMC ,,,,, '''' ≠∩∈∀= (12)

( ) ( ) ( ) LMvvUuuMM vuvuvu≥⊄⊄⇒≠ ,

''

,, ''

(13)

Let U and V be two network parameters which are belonging to the IoT family F. Let u and v are

two identical subsequence belonging to U and V respectively. Mu,v is to represent the matched

subsequence of surflex characteristics such as u and v and L represent the minimum length that

this similarity should have. Cu,v is defined by the key set of matched parameter values Mu,v for the

similarity function.

The matching set Cu,v include all the matched subsequence of maximum length between the

sequence u and v. ⊄ Indicates that the one type of botnets is not included in another cluster. All

possible matched parameter values should satisfy LM vu ≥, since each Mu,v in Cu,vis an

expansion of matched parameters of length L. Therefore, these approaches gather all the matched

network parameter values of length L in linear time. Then weightage value would be given to all

matched parameter values to make difference among all other authenticated users.

[ ] [ ][ ]∑=

=

M

i

jMiMTMW

1

,)( (14)

Where M[i] is the ith

Botnet of the matched parameter value M and M[i], M[j] is the weightage

value of each Botnet in the network. T represents the substitution matrix. For the pair of

parameter values U and V, matching score Su, v would be defined as

56

( )vuMAX

CMS

vuvu

,

,,

⊂=

(15)

Let Smax be the matching score of the largest network parameter value belonging to the IoT

supported network. The maximum of matching score value is defined by

{ }{ }FVvuSS vu ⊂== ;max;,max (16)

Finally, the similarity measure between the two parameters U and V are done by dividing the

match score value by the maximum value. Based on that similarity measure value, Botnet would

be clustered.

4.5.2 Pseudo code for P-SIM clustering

Matched set is obtained by

M Matched parameter value

C Matching set

For i to 1 maximum of u and v

1,0 == jk

While

<< vjanduk

[ ] [ ]( )jvkuif =

Then add the botnet ][ku to M

Else if ( )1≥M add M to C

Empty M

End else

57

Increment k , Increment j

End while

If ( )1≥M add M to C

Empty M

ik = 0=j

While

<< vjanduk

If [ ] [ ]( )jvku =

Then add the botnet ][ku to M

Else if ( )1≥M add M to C

Empty M

End else

End While

By clustering the Botnet attacks based on the similarity value from the dataset from the training

phase, all kinds of attacks would be captured and destroyed to enhance the reliability of the

network in an IoT environment. The proposed classifier included a random Poisson forest that

counts the number of events and the time when these events occur in a given time interval. It

achieved a better prediction rate during the training phase. After being trained, a similar Botnet

would be clustered using Surflex-PSIM, which isolates the Botnet attacks as clusters based

automatic trained characteristics pocket value based on the Surflex characteristics of attacks.

58

FIGURE 4.6 Flow diagrams for clustering of Botnet

In this section, the proposed Isolating Botnet attacks using bootstrap aggregating surflex-PSIM

classifier. It clustered the Botnet attack based on the P-SIM clustering, which isolates the Botnet

attacks as clusters based automatic trained characteristics pocket value based on the surflex

characteristics of attacks. Also, wide dataset inputs are subjected to accurate clustering. The

timing of the individual Botnet removal analysis can be avoided so that accurate and time-

consuming Botnet detection can be achieved. It maintains the reliability and quality of service in

IoT applications.

List the devices with

their activities

Compute similarity matrix

Set device as a cluster

Number

cluster =1

Update a similarity matrix

Merge two similar devices

End

Yes

NoNo

Begin

59


The performance of the proposed system is evaluated based on the clustering ratio of different

types of Botnet attacks. Botnet attack means the group of attackers comes together with the aim

of the destruction of the whole network. Here two types of attacks by Botnet are considered.

They are Distributed Denial of Service (DDoS) attacks and spam attacks. DDoS is a digital

attack in which the culprit tries to make a machine or system asset inaccessible to its planned

clients by incidentally or inconclusively disturbing administrations of a host associated with the

Internet.

Email spam is the electronic form of garbage mail. It includes sending unwanted messages,

regularly spontaneous publicizing to countless. Spam is a genuine security worry as it can be

utilized to convey Trojan stallions, infections, worms, spyware and focused on phishing attacks.

The main difference is that in a general attack, one or two attackers would carry out a different

operation to disturb the normal flow of the network, but in a Botnet attack, groups of attackers

with the same intention would come together and carry out the same operation to destroy the

network’s reliability.

4.6.1 Implementation

The proposed system for IoT based network is implemented by python language

FIGURE 4.7 The IoT based network

4.6.2 Packet Delivery Ratio

The estimation of Packet Delivery Ratio (PDR) depends on the received and created

bundles (number of packets) as recorded in the trace document. All in all, PDR is characterized

as the proportion between the got bundles by the goal and the crea

FIGURE 4.8 Packet

4.6.3 Packet Loss

Packet loss happens when at least one packet

neglect to achieve their goal. Packet loss is estimated as a level of packets lost concerning

packets sent. The below figure depicted the packet lost ratio of

time and attack time.

60

The estimation of Packet Delivery Ratio (PDR) depends on the received and created

as recorded in the trace document. All in all, PDR is characterized

as the proportion between the got bundles by the goal and the created parcels by the source.

Packet delivery ratios during normal and attack period

appens when at least one packet of information traversing a computer network


packets sent. The below figure depicted the packet lost ratio of IoT based network during normal

The estimation of Packet Delivery Ratio (PDR) depends on the received and created a number of

as recorded in the trace document. All in all, PDR is characterized

ted parcels by the source.

during normal and attack period

of information traversing a computer network


based network during normal

FIGURE 4.9 Packet

4.6.4 Throughput

In data transmission, network throughput is the amount of data transferred successfully from

source node to destination node in a specified time

in megabits per second (Mbps) or gigabits per second (Gbps).

FIGURE 4.10 Throughput of the network under normal and attack period

61

Packet Loss of the network during normal flow and attack


node in a specified time and typically measured in bits per second

bps) or gigabits per second (Gbps).

hroughput of the network under normal and attack period

of the network during normal flow and attack


d in bits per second, as

hroughput of the network under normal and attack period

62

TABLE 4.1 Log detail of each smart object in the network

Node IP Address Arrival time

(Sec)

Packet Delivery

ratio

Packet

Loss

(Kbps)

Throughput

(Kbps)

n1 151.142.255.1 2.256 88.025 2.2835 56.895

n2 151.142.255.2 1.267 93.211 1.4756 54.742

n3 151.142.255.3 8.278 94.723 1.8629 55.315

n4 151.142.255.4 1.289 94.601 1.8687 56.889

n5 151.142.255.5 1.314 94.783 1.4756 53.895

n6 151.142.255.6 5.311 89.5404 1.8629 56.888

n7 151.142.255.7 4.322 93.031 2.1905 53.895

n8 151.142.255.8 3.333 95.5216 2.1905 51.2

n9 151.142.255.9 2.344 88.0122 1.4756 60.235

n10 151.142.255.10 1.355 90.5028 2.1905 53.895

n11 151.142.255.11 1.366 92.9934 1.8629 51.2

n12 151.142.255.12 9.377 95.484 2.1905 51.2

n13 151.142.255.13 8.388 87.9746 2.2835 51.2

n14 151.142.255.14 7.399 91.4652 1.9597 51.895

n15 151.142.255.15 6.441 92.9558 1.8629 50.96

n16 151.142.255.16 5.421 89.937 1.8629 53.895

n17 151.142.255.17 4.432 90.4276 1.57717 56.889

n18 151.142.255.18 3.443 92.9182 1.9598 56.888

n19 151.142.255.19 2.454 95.4088 1.8629 51.221

n20 151.142.255.20 1.465 89.8994 1.9598 56.38

4.6.5 Clustering of Botnet of Distributed Denial of Service (DDoS)

In this type of Botnet attack, a group of attackers would send the request for a resource to the

same destination address for a specified time continuously, so an authenticated user cannot get

that resource for a particular time. The proposed algorithm would cluster those nodes based on

the similarity value of packet sending time, a destination address and the resource they requested

continuously and the distance between source nodes and the destination node is calculated in

order to group the attacks efficiently.

63

In existing systems, hierarchical based clustering has been incorporated to cluster the devices of

the attackers in the IoT based network, the main problem with hierarchical based clustering is

that if the decision is taken once to join two clusters, it cannot be cancelled, but in this work, a

mixture model is used for clustering, so it has both matrices distance as well as similarity-based,

so the clustering ratio is high when compared with existing techniques.

TABLE 4.2 List of nodes clustered under DDoS Botnet attack

Node Source IP

Address

Packet Sending

Time(sec)

destination IP

Address

Resource

n1 151.142.255.1 0.214 151.142.250.11 file-1

n2 151.142.255.2 0.214 151.142.250.11 file-1

n3 151.142.255.3 0.214 151.142.250.11 file-1

n4 151.142.255.4 0.214 151.142.250.11 file-1

n5 151.142.255.5 0.214 151.142.250.11 file-1

n6 151.142.255.6 0.214 151.142.250.11 file-1

n7 151.142.255.7 0.214 151.142.250.11 file-1

n8 151.142.255.8 0.214 151.142.250.11 file-1

n9 151.142.255.9 0.214 151.142.250.11 file-1

n10 151.142.255.10 0.214 151.142.250.11 file-1

4.6.6 Clustering of Botnet of Spam attack

Here the Botnet would send the email to the spam box instead of sending to the inbox of the mail

application. It includes sending undesirable messages, regularly spontaneous publicizing to

countless. Spam is a genuine security worry as it can be utilized to convey Trojan stallions,

infections, worms, spyware and focused on phishing attacks. The proposed system would cluster

this type of Botnet based on the behavior that sending file to spam box instead of sending to the

inbox of the mail. Existing techniques did not cope with different sized cluster and irregular

shapes and need of breaking large clusters since they are based on hierarchical-based clustering.

In this work mixture-based clustering is incorporated so it manages all shapes of clustering.

64

FIGURE 4.11 Clustering of Botnet attack which leads to DDoS attack

TABLE 4.3 Lists of nodes clustered under Botnet Spam attack

Node Source IP Address Packet Sending Time

(Sec)

destination IP Address Resource

N11 151.142.255.11 0.214 151.142.255.1 mail

N12 151.142.255.12 0.214 151.142.255.1 mail

N13 151.142.255.13 0.214 151.142.255.1 mail

N14 151.142.255.14 0.214 151.142.255.1 mail

N15 151.142.255.15 0.214 151.142.255.1 mail

N16 151.142.255.16 0.214 151.142.255.1 mail

N17 151.142.255.17 0.214 151.142.255.1 mail

N18 151.142.255.18 0.214 151.142.255.1 mail

N19 151.142.255.19 0.214 151.142.255.1 mail

N20 151.142.255.20 0.214 151.142.255.1 mail

65

FIGURE 4.12 Clustering of Botnet attack which leads to Spam attack

4.6.7 Comparison of proposed system with existing techniques

In this section, the proposed system is compared with existing classifiers like decision tree,

Random forest, RBF. In order to evaluate the proposed system following parameters are

considered Precision, Recall, F-measure, and Accuracy.

TABLE 4.4 List of classifiers with proposed system


Decision Tree 0.968 0.931 0.949 96.5333

Random Forest 0.968 0.934 0.95 96.667

RBF 0.976 0.927 0.95 96.5333

Proposed 0.961 0.986 0.976 99.04

Precision

Precision is revealed in the fraction of the test part of the data as the attack is literally from the

attack categories.

FPTP

TPprecision

+=

Where TP represents the true positive value, FP indicates the false positive.

FIGURE

The proposed system has achieved

classifiers such as Decision tree, Random for

which counts the number of events and the time that these events occur in a given time interval.

Recall

Recall measures the fraction of attack class that was correctly detected

Where TP indicates the True Positive value and

66

FIGURE 4.13 Comparison graph for Precision

The proposed system has achieved an optimum precision value of 0.96 compared with other

classifiers such as Decision tree, Random forest, RBF since it has adapted Poisson


Recall measures the fraction of attack class that was correctly detected

FNTP

TPcall

+=Re

ositive value and FN indicates the False Negative.

compared with other

oisson distribution,


.

FIGURE 4.

The proposed system has achieved better

Decision tree, Random forest, and

Since the proposed system has used similarity

correctly.

F-Measure

F-measure can measure the test accuracy. It is a measurement of balance between precision and

recall.

Where P represents the precision and R denotes the r

67

FIGURE 4.14 Comparison graph for Recall

The proposed system has achieved better a recall value of 0.98, whereas other classifiers such as

and RBF have got the value of 0.93, 0.93, and 0.92

has used similarity-based clustering, it has separated each event

measure can measure the test accuracy. It is a measurement of balance between precision and

RP

RPmeasureF

+=−

**2

he precision and R denotes the recall value.

whereas other classifiers such as

RBF have got the value of 0.93, 0.93, and 0.92, respectively.

ng, it has separated each event

measure can measure the test accuracy. It is a measurement of balance between precision and

FIGURE 4.

Since the proposed system has adapted random Poisson distribution in

recorded all the rare events which are happened in the

experienced a better F- measure value of 0.97

random forest, and RBF have a value of 0.94

Accuracy

Accuracy is defined as the ratio of number of correctly classified

number of Botnet attacks

Where IcB indicates the correctly identified

attack.

68

FIGURE 4.15 Comparison graph for F-measure

Since the proposed system has adapted random Poisson distribution in the training phase, it has

recorded all the rare events which are happened in the IoT environment. Hence it has

measure value of 0.97, whereas other classifiers such as decision tree,

value of 0.94, 0.95 and 0.95, respectively.

Accuracy is defined as the ratio of number of correctly classified Botnet attacks to the total

TB

BIAccuracy c=

indicates the correctly identified Botnet attack, TB denotes the total n

training phase, it has

environment. Hence it has

reas other classifiers such as decision tree,

attacks to the total

denotes the total number of Botnet

FIGURE 4.1

Since the proposed system has employed

given time and Surflex-PSIM which isolates the

trained characteristics pocket value based on the S

experienced the better accuracy of 99.04

random forest, and RBF have got the value of 96.53%, 96.66%

are becoming a significant cyber

detect the presence of malicious bots and other anomalies in the ne

chapter 5.

69

FIGURE 4.16 Comparison graph for Accuracy

e proposed system has employed Poisson distribution which captures ra

PSIM which isolates the Botnet attacks as clusters based automatic

tics pocket value based on the Surflex characteristics of attacks,

experienced the better accuracy of 99.04%. In contrast, other classifiers such as decision tree,

ot the value of 96.53%, 96.66%, and 96.53% respectively

re becoming a significant cybersecurity threat for IoT applications. It is, therefore

detect the presence of malicious bots and other anomalies in the network, which has proposed

distribution which captures rare events for a

attacks as clusters based automatic

urflex characteristics of attacks, it has

other classifiers such as decision tree,

respectively. Bots

therefore, essential to

which has proposed in

70

CHAPTER 5

A Novel Forecastive Anomaly Based Botnet Revelation

Framework for Competing Concerns in Internet of

Things: Third Approach

5.1 Forecastive Anomaly-based Botnet Revelation Framework

IoT combines many low-cost heterogeneous devices that can generate large volumes of private

information with less or no security, which leads to security issues. This unwrapped lesion in IoT

security gives rise to an attacker to develop a network of bots to infect the devices with malicious

applications called Botnet. Botnet provides a distributed platform for a number of prohibited

activities including Distributed Denial of Service (DDoS) attacks against crucial targets,

phishing, malware dissemination and click fraud, etc. Preceding methodologies utilized various

Botnet detection grouped under behavior-based detection systems and user data-based detection

system to solve these security problems. Furthermore, machine learning algorithms are also in

high demand to face the issues caused by Botnet even though they fail to predict the anomalies

based on their behavior and results with poor accuracy in detecting Botnet, etc.

FIGURE 5.1 Proposed Frameworks

71

Hence to deal with this Botnet and the hazardous anomalies that are highly vulnerable with the

existing approaches, a novel forecastive anomaly-based Botnet revelation framework is designed

in our proposed work is shown in Figure 5.1. The approach works as a two-way progression, i.e.,

first is the instance creation, and the second is cataloging. As an alternative to machine learning

algorithms, ensemble-based stream mining is being used to generate several instances with less

memory and time in our work. Once the instances are created, Graph Structure Based Detection

of Anomaly (GSBDA) is initiated based on features derived by the stream mining algorithm to

detect the existence of hazardous anomalies. In addition, the second phase deploys a KNN (K

Nearest Neighbor) algorithm, a type of instance-based learning algorithm. It is used to identify

the Botnet accurately by observing the network flows. Thus, the poor security practices are

addressed, and issues caused by Botnet are detected with our proposed framework.

5.2 Ensemble-based Stream Mining

In novel ensemble-based Stream mining, the data that are relevant to certain anomalies are

collected from the organizations for the past several years and are then characterized as an

unbounded data stream. These unbounded data streams are then partitioned into several numbers

of large pieces called instances.

FIGURE 5.2 Ensemble-based stream mining-concepts drift in the unbounded data stream

72

Figure 5.2 shows how the judgment boundary for a classifier varies when such current

experiences concept drift. While considering Figure 5.2, the circles in the unbounded data stream

represent the data point. The unfilled circles denote True Negatives (TN) (i.e., non-anomalies)

and the solid circles represent True Positives (TP) (i.e., anomalies), respectively. Here the dashed

line indicates the old decision boundary and the dark solid line indicates the new decision

boundary for those chunks, respectively.

Shaded circles represent a new-fangled notion, which has drifted comparative to the prior chunk.

Thus to perform categorization, the decision boundary ought to be accustomed to account for the

new-fangled notion.

Let us consider the probable assortment of misinterpretation (false detection):

1) The judgment boundary of chunk two is marginally shifted in comparison to chunk one.

Thus, an inaccurate definition of many non-anomalous data was labeled anomalous, resulting in

FP (False Positive) score.

2) The judgment boundary of chunk three is marginally shifted compared to chunk two.

Therefore, an inaccurate definition of many non-anomalous data was categorized as anomalous,

resulting in FN (False Negative) score.

In general, the intersection caused between the old and the new decision boundaries for the same

chunk would increase the FN and the FP counts. Hence to perform classification, an ensemble-

based stream mining concept is proposed, which classifies all the data instances in the stream.

The ensemble classification procedure is illustrated in Figure 5.3. Here C1, C2, C3, and C4 are

running GSBDA. The static controlled GSBDA is originally used to train models from one-

person models. The normative substructures are defined in the chunk, and comparisons are made

between models in the ensemble. The growing model classifies the test substructure dependent

on the model's difference between the measure and the model's normative substructure. Once all

models cast their votes, a weighted majority vote on the rating is introduced to make a final

decision.

Ensemble development is maintained so that a set of K models is maintained correctly at all

times. As each new component arrives, a K + 1 model is created from the new component, and

the hunting model of these K +1 model is discarded. Those who leave can be selected in many

ways. One approach calculates the estimated prediction error of each K + 1 model in the recent

chunk to find the poorest attendee. Recent truths should be readily available so that predictive

error can be accurately measured.

73

5.3 Ensemble Classification approach

The classification approach makes use of a classifier and the procedure for ensemble

Figure 5.3 Ensemble-based classifier designs

If ground truth is not available, we will instead rely on majority voting; the model with the

minimum contract is left to the majority decision. It is a combination of the K model, which best

fits the current concept.

Once when the instances are created, Graph Structure-Based Detection of Anomaly (GSBDA) is

initiated based on features derived by the stream mining algorithm in order to detect the presence

of hazardous anomalies.

Algorithm:

For each model, KϵE do

Test K on Ln and compute its predictable error

End for

Kn Newly trained GSBDA classifier

Test Kn on Ln and calculate its predictable error

E best K classifiers from K

With this algorithm, a new model from the most recent chunk is identified and has been added

temporarily into the ensemble line

possible related anomalies. Finally, the ensemble is updated by dumping the model with the most

disagreements from the weighted mass opinion. The model with an arbitrary poor performance is

then discarded in case of multiple models having high disagreements.

Weighted majority opinions are calculated using the formula

Where,

The team includes a model trained with Chunk i,

Reported anomalies

The most recent is Chunk’s index is l

The weighted average WA (EN

majority vote.

Consider a fair example for GSBDA, which is shown in F

obtained after iteration in GSBDA is shown in F

74

Algorithm: Ensemble Classifier

// E – Current Ensemble

and compute its predictable error // Ln – Most recently labeled data

//chunks

Newly trained GSBDA classifier from data Ln // This is newly Trained Model

and calculate its predictable error // Testing This Model for Error

best K classifiers from Kn depending on the predictable error // Select the model with

// less Predictable Error


line. It is then followed by testing the graph t to check for the



tiple models having high disagreements.

Weighted majority opinions are calculated using the formula

{ }

{ }∑

∑

∈

−

∈∈

−

=

EMi

il

AaEMi

il

MaEWA

|

,|),(

λ

λ

(1)

The team includes a model trained with Chunk i,

is a constant fading factor

The most recent is Chunk’s index is l.

WA (EN, a) is then rounded to integer (0 or 1) to obtain the weighted

e for GSBDA, which is shown in Figure 5.6 a. The best substructure

iteration in GSBDA is shown in Figure 5.6 b.

Current Ensemble

Most recently labeled data

is newly Trained Model

// Testing This Model for Error

/ Select the model with


testing the graph t to check for the



is then rounded to integer (0 or 1) to obtain the weighted

he best substructure

75

FIGURE 5.4 a) Typical fair GSBDA example, b) Best substructure and c) Anomalous

Substructure

Then, on the second iteration, this substructure is compressed to a single vertex, extensions are

estimated, and the resulting anomalous substructure is shown in Figure 5.5 c. Once more, the

edge and vertex is labeled as the real abnormality, yet the whole irregular substructure is output

for conceivable investigation.

This method uses GSBDA's past iteration findings to identify irregularities in the current chunks

of data. That is, in each example, the normative substructures found in previous GSBDA

iterations that continue. This requires the model to take all data into account until the model

produces an ensemble that is not similar to the current chunk.

Poorly performing outdated models are being replaced by higher-performing, younger versions

more suited to the current concept. While the cumulative amount of data in the system is

technically unbounded, this makes tractable every round of classification.

5.4 Cataloging

Our approach determines the similarity in behavior of hosts using its varied properties such as

netflows of information during a predefined time window and attempts to detect bots by

correlating these comparable behaviors between distinct time windows.

76

This work makes use of a KNN clustering algorithm to identify the bot based on the netflows.

Netflow generating components generate TCP netflows between hosts. At that point, netflow

Clustering and Alert Clustering components group non-filtered netflows and alerts. At long last

toward the finish of each time window connection relates the created alert clusters and netflow

clusters so as to distinguish the bot contaminated hosts. When instances are developed, Graph

Structure-Based Detection of Anomaly (GSBDA) is implemented on the basis of features

derived from the stream mining algorithm to detect the presence of hazardous anomalies. In

addition, the second phase uses the KNN (K Nearest Neighbor) algorithm [100], a form of the

instance-based learning algorithm. It is used to specifically classify the Botnet by analyzing the

network flows.

KNN Algorithm Pseudocode:

Let (Xi, Ci) where i = 1, 2……., n be data points. Xi denotes feature values & Ci denotes labels

for Xi for each i.

Assuming the number of classes as ‘C’, Ci ∈ {1, 2, 3…, C} for all values of i.

Let x be a point for which label is not known, and we would like to find class using KNN

algorithm.

1. Calculate “d(x, xi)” i =1, 2,.., n where d denotes the Euclidean distance between the

points.

2. Arrange the calculated n Euclidean distances in non-decreasing order.

3. Let k be a +ve integer, take the first k distances from this sorted list.

4. Find those k-points corresponding to these k-distances.

5. Let ki denotes the number of points belonging to the ith

class among k points i.e. k ≥ 0

6. If ki >kj ∀ i ≠ j then put x in class i.

As a result, poor security practices are discussed and issues related to Botnet are defined with

this system. Results and Implementation part of the above sections are described below.


In the proposed work, CTU 13 dataset is used [78]. It features scenery based on Botnet traffic

restricted to the Czech Technical University. It features different scenarios with different types of

77

cyber attacks with Botnet. All such scenarios are recorded individually as a separate file. Each

file has 14 attributes and a label. Typically, a dataset consists of about 13 different scenarios.

The CTU-13 dataset contains Botnet, normal, and background traffic and contains 13 scenarios

shown in the Table 5.1 where each scenario is created with different malware. Normal traffic

was created by regular users by internet surfing, mail checking, and surfing social media sites.

Background traffic is generated to show the presence of Botnet traffic. These all scenarios were

captured in 'pcap' files.

TABLE 5.1 CTU-13 scenarios [78]

This dataset contains total of 15 columns namely ‘StartTime’(Start time of the attack),

‘Dur’(Duration of the attack in second), ‘Proto’(Protocols e.g. TCP,UDP,ICMP etc),

‘SrcAddr’(Source IP address), ‘Sport’(Source port number), ‘Dir’(Direction the traffic),

‘DstAddr’(Destination IP address), ‘Dport’(Destination port number), ‘State’(State of the

transaction according to the protocol), ‘sTos’(Source type of service filed), ‘dTos’(Destination

type of service field), ‘TotPkts’(Total transaction packet count), ‘TotBytes’(Total transaction

bytes), ‘SrcBytes’ (Total transaction bytes from source to destination), and ‘Label’(Three target

values namely background, Botnet and normal). The direction column defines TCP connection

source and the symbol at the center represents transaction state. The symbol ‘-’ means the

transaction was normal, ‘|’ means the transaction was RESET, ‘o’ means the transaction timed

out and ‘?’ means that the transaction direction was unknown. Table 5.2 shows dataset

distribution for background flows, Botnet follows and normal flows.

78

TABLE 5.2 CTU–13 Dataset Distributions

Scenario Background Flows

(%)

Botnet Flows (%) Normal Flows (%)

1 95.40 0.89 3.69

2 95.59 0.85 3.54

3 94.60 0.49 4.89

4 91.91 0.15 7.93

5 91.37 0.46 8.15

6 94.12 0.22 5.64

7 93.71 0.06 6.22

8 95.47 0.10 4.42

9 90.22 5.02 4.75

10 87.54 6.24 6.21

11 29.33 67.97 2.69

12 29.33 67.97 2.69

13 93.76 1.67 4.55

As shown in Table 5.2, CTU-13 dataset, scenario 1 has 95.40% background traffic, 0.89 %

Botnet traffic and 3.69% normal traffic. Same we can observe for other scenarios as well. The

greater amount of imbalance in apparent in the traffic present in this dataset.

Furthermore, the visuals of the CTU-13 dataset are used in the same way as suggested by its

author [78]. Initially, in our proposed framework, the work begins with initializing the number of

nodes in IoT, which is shown in Figure 5.5 (a). Here we are initializing 30 nodes, and the

creations of nodes are shown in Figure 5.5 (b).

Once the nodes get created, then immediately using ensemble-based stream mining, several

numbers of instances are generated based on the behavior of network packets. Hence the time,

as well as memory complexity, will be reduced with high prediction accuracy. Here the

anomalies are detected using a GSBDA with the information obtained from Stream mining.

Moreover, the Botnet has been detected based on the KNN algorithm in the cataloging phase,

which detects the Botnet based on the netflow. The detected nodes are shown in Figure 5.6.

79

(a) (b)

FIGURE 5.5 a) Node Initialization and b) Node Creation

FIGURE 5.6 Nodes that are detected as Botnet

5.5.1 Performance Analysis

Here in table 5.1, the performance of our proposed work is discussed with various metrics such

as TPR (True Positive Rate), FPR (False Positive Rate), precision, accuracy, error rate, and F-

measure. Our proposed framework achieves better results, say, 0.97 TPR, 0.19 FPR, 0.80

precision, 0.98 accuracy, 0.5 error rates and 0.87 F-measure.

80

TABLE 5.3 Performance metric of the proposed framework

Performance

Metrics

TPR FPR Precision Accuracy Error

Rate

F-measure

Proposed 0.97 0.19 0.80 0.98 0.5 0.87

TABLE 5.4 Throughput, packet loss, and packet delivery ratio, and arrival time of the five nodes

Node Arrival

time (sec)

Packet

Delivery

ratio

Packet

Loss

(Kbps)

Throughput

(Kbps)

Node1 2.256 98.025 2.2835 56.895

Node2 1.267 94.211 1.4756 54.742

Node3 8.278 94.723 1.8629 55.315

Node4 1.289 94.601 1.8687 56.889

Node5 1.314 94.783 1.4756 53.895

Table 5.4 describes packet arrival time, packet delivery ratio, packet loss, and departure. For

Node 1, the arrival time is 2.256, the packet delivery ratio is 98.025, the packet loss is 2.2835

and the output is 56.895. For Node 2, the arrival time was 1.267, the packet delivery ratio was

94.211, the packet loss was 1.4756, and the output was 54.742. For Node 3, the arrival time was

8.278, the packet delivery ratio was 94.723, the packet loss was 1.8629, and the throughput was

55.315. For Node 4, the arrival time was 1.289, the packet delivery ratio was 94.601, the packet

loss was 1.8687, and the throughput was 55.3889. For Node 5, the arrival time was 1.314, the

packet delivery ratio was 94.783, the packet loss was 1.4756 and the throughput was 55.895.

FIGURE 5.7 Arrival time, packet delivery, packet loss, and throughput

TABLE 5.5 Total no. of nodes to search for Botnet and other anomalies

Dataset No. of

bots

No. of

identified

bots

CTU13 5

Table 5.3 describes the Total number of nodes that

anomalies. Here in our proposed work, the total nu

is 5. Accordingly, about 30 nodes

81

rrival time, packet delivery, packet loss, and throughput of five nodes

Total no. of nodes to search for Botnet and other anomalies

No. of

identified

bots

No. of

other

anomalies

detected

Size of Bot

cluster

No. of nodes in

search of Bots &

anomalies

5 10 30

Total number of nodes that are subjected to attack under Botnet and other

anomalies. Here in our proposed work, the total number of anomalies detected is 10

nodes have been used to search for Bot as well as anomalies

of five nodes

Total no. of nodes to search for Botnet and other anomalies

No. of nodes in

search of Bots &

anomalies

30

are subjected to attack under Botnet and other

mber of anomalies detected is 10, and Botnet

to search for Bot as well as anomalies.

82

5.5.2 Performance Comparison

The quantitative results are presented in Table 5.4. The proposed algorithm for net flow analysis

has been compared with other methods such as Bclus, CCD, and Spark-ELM. Tables for

performance metrics 5.6–5.10, such as accuracy, precision, and f-measurement are recorded in

datasets recorded in five attempts, including Botnet operations.

TABLE 5.6 Comparison with prior methodologies for Scenario 1

Method TPR FPR Precision Accuracy Error Rate F-measure

Spark-

ELM

0.91 0.14 0.68 0.87 0.13 0.77

CCD 1.0 0.05 0.86 0.96 0.03 0.92

Bclus 0.4 0.4 0.5 0.5 0.4 0.48

Proposed 0.81 0.01 0.79 0.96 0.05 0.86


Method TPR FPR Precision Accuracy Error

Rate

F-measure

Spark-

ELM

0.95 0.05 0.88 0.95 0.05 0.92

CCD 0.74 0.02 0.96 0.88 0.11 0.92

Bclus 0.3 0.2 0.6 0.5 0.4 0.41

Proposed 1.0 0.04 0.95 0.98 0.04 0.96

Scenario 1 confers in Table 5.6 comprises IRC-based Botnet that sends spams, whereas scenario

2 in Table 5.7 consists of bots but differs with the number of net flows. The proposed approach

achieves better results in terms of accuracy and error rates for the same bot in the second

scenario.

83



Rate

F-measure

Spark-

ELM

0.89 0.02 0.92 0.86 0.02 0.96

CCD 0.0 0.0 0.0 0.64 0.35 0.0

Bclus 0.0 0.0 0.4 0.4 0.5 0.04

Proposed 0.94 0.00 0.95 0.94 0.06 0.98

In Scenario 6, Table 5.8, the Botnet scans the SMPT mail servers for several hours and connects

to several remote desktop services. While comparing the existing methods with our proposed

work, our proposed work effectively detects the malware and thereby achieves better

performance than the prior works.



Rate

F-measure

Spark-

ELM

0.26 0.09 0.47 0.76 0.24 0.33

CCD 0.0 0.0 - 0.64 0.35 0.0

Bclus 0.0 0.04 0.0 0. 66 0.33 -

Proposed 0.28 0.1 0.52 0.89 0.23 0.35

In Scenario 8, Table 5.9, the Botnet communicates with various C&C hosts and receives

encrypted data. So if data is encrypted, then it is difficult to make the decision regarding

malicious data. All methods give a low value of TPR. In this case, the malware used only certain

and very specific communication channels to communicate with the C&C server, which were not

reflected in the training data. However, it gives a higher recognition rate than other methods

proposed.

84



Rate

F-

measure

Spark-ELM 0.94 0.12 0.89 0.93 0.06 0.94

CCD 0.38 0.04 0.93 0.59 0.4 0.54

Bclus 0.1 0.2 0.4 0.4 0.5 0.25

Proposed 0.92 0.03 0.9 0.96 0.06 0.97

In Scenario 9 in Table 5.10, some hosts are infected with the Neris malware, which actively

starts sending spam emails. But when comparing the existing methods with our proposed work,

our proposed work achieves better performance than the prior works.

Figure 5.8 reveals the comparison of proposed methods with supervised and unsupervised

learning methods. It describes the FP, TN, FN, TP, Accuracy, false-positive rate, and false-

negative rate of supervised, unsupervised and proposed methods .This comparison result exhibits

a better performance result for the proposed method than the existing works.

FIGURE 5.8 Comparison of proposed with Supervised and unsupervised learning methods

While comparing the prior methodologies with our proposed work, the proposed framework

exhibits better results in terms of prediction/detection accuracy.

Thus, this work successfully detects the anomalies with high prediction accuracy.

0

0.2

0.4

0.6

0.8

1

1.2

Supervised

Unsupervised

Proposed

85

CHAPTER 6

Conclusion, Future Scope and References

6.1 Conclusion

The cyber-world is continually developing, and new technology stacks are being proposed on a

regular basis by researchers. At the same time the Internet of Things (IoT) has promoted

significant changes in our daily life in many aspects such as smart home, smart city, connected

health, intelligent supply chain, smart farming, etc.

A context-aware application is still required in IoT, which would sense the physical environment

from the security point of view and protect the devices accordingly. Providing comprehensive

information security is challenging and an integral part of the IoT-based system.

IoT consists of many heterogeneous and low-cost devices with little or no security embedded

into them, which generate a huge amount of private information, and may create many security

problems. This unwraps lesion in IoT security gives rise to an attacker to develop the network of

bots to infect the devices with malicious applications called Botnet. Botnet supplies a distributed

platform for prohibited activities like initiating Distributed Denial of Service (DDoS) attacks

against crucial targets, phishing, and malware dissemination, click fraud, etc.

Nodes of the IoT are limited in resources where dedicated, and diversified communication

protocols are used. Some of these differences weaken the ability of the IoT nodes to protect

themselves. Day-by-day new technologies are being developed, followed by a continual

development in the cyber world. At the same time the IoT has endorsed great changes in our

daily life in numerous aspects, such as health care and traffic monitoring services. Moreover, it

aids the machine to machine communication by connecting multiple devices over the internet.

Conversely, there is a rise in intrinsic vulnerabilities that are often leveraged by cybercriminals.

Yet, the number of active users in IoT gets increased day by day. One of the major security

concerns in IoT is Botnet, a pervasive and hazardous thread. Several thousand to millions of

compromised computers (bots) in a network are used by malicious attackers to perform various

illicit and vulnerable activities. In order to deal with these security issues, prior methodologies

make use of different Botnet detection techniques broadly classified into behavior-based

86

detection systems and user data-based detection systems. Furthermore, machine learning

algorithms are also is in high demand to face the problems caused by Botnet.

In this context, several proposed classifiers have been successfully utilized here on real test bed,

with achieving higher prediction rate. Clustering the same kind of Botnet from the trained data

set using multiple algorithms enables the mass removal of Botnet.

• The proposed method is evaluated with the experimental setup with real IoT

nodes.

• The proposed systems achieve more reliability of the IoT-based network by

removing Distributed Denial of Service (DDoS) and spam Botnet.

• The results obtained for the first proposed system have exposed better

performance when compared to the existing systems.

• Thus, the first proposed mass removal Botnet attack using heterogeneous

ensemble stacking PROSIMA Classifier in IoT has clustered each type of Botnet

attack such distributed denial of service, spam Botnet attack, and maintaining the

reliability and quality of service in IoT applications. This approach achieved high

detection accuracy value of 98.63%.

In search of a simplified algorithm along with higher prediction accuracy, another classifier is

proposed. The primary significance of this approach is to search for the best feature among an

irregular subset of features. This procedure has achieved a unique approach in Botnet detection.

• It has obtained the optimal precision value of 0.961 and a recall value of 0.986. It

accomplished a high F-measure value of 0.976 and high detection accuracy value

of 99.04%.

• Thus, the proposed isolating Botnet attacks using bootstrap aggregating surflex-

PSIM classifier in the IoT has clustered each type of Botnet attack such

distributed denial of service, spam Botnet attack, and maintaining the reliability

and quality of service in IoT applications.

With the internet, billions of devices in the Internet of Things (IoT) are interconnected and

communicated with each other through messaging bots. The messaging bots are sometimes

87

controlled by the attackers to carry out several malicious activities. Thus, bots become a serious

cybersecurity hazard for IoT devices. For this reason, it is crucial to detect the existence of

malicious bots and other anomalies in the network. Thus, to tackle these bots and anomalies, the

third approach is proposed.

• In the third proposed approach, a novel forecastive anomaly-based Botnet

revelation framework, an ensemble-based stream mining is used to generate

several numbers of instances.

• Once the instances are created, GSBDA is employed here to detect the presence

of hazardous anomalies.

• Finally, in the cataloging phase, with the help of the KNN clustering algorithm,

the Botnet is accurately detected from the anomalies with the use of ensemble-

based stream mining.

• It is concluded that our proposed frameworks effectively detect the anomalies,

including Botnet, more accurately with reduced time complexity.

6.2 Future Scope

IoT is still in its growing phase, with various security models and structures recently proposed to

address its security challenges and privacy issues. The Botnet can be identified by observing the

behavior of the bots on the network-tracked traffic by monitoring the traffic flow of the system.

Botnet's behavior can be analyzed using classification techniques, which help find the

characteristics that distinguish Botnet traffic from benign traffic. Some methods focused on

detecting the Botnet present in the network using machine learning techniques to discern the

patterns shown by the Botnet in the system.

For potential context, other types of data collected by honeypot, such as malicious binaries,

attack replays, etc., may be considered when researching and detecting Botnet. In the future, this

strategy needs to be extended to the next stage, where open problems or concerns can be seen by

applying them in real-time scenarios. As per News from security intelligence, a new Botnet has

spiked among the Internet of Things (IoT), the Mozi Botnet.

88

This Botnet is active since 2019. Mozi Botnet has been accounted for around 90% of IoT traffic

in just one year. Its code overlaps with Mirai and other Botnet variants. Now there is a vast scope

of research to detect and remove Mozi Botnet. The main target of Bot-master is a huge audience

and keeping themselves hidden from these huge audiences. As we know, now a day’s very huge

audiences are on social media like Facebook, Twitter, and whatnot. In these mediums, people’s

trust levels are pretty high, so that they trust what others are sending.

Numbers of services are provided by social media sites like banking, friend lists, gaming, etc., if

they can compromise these, which led to a very sophisticated fraud scheme. In this era of mobile

communication, another severe threat is possible on smartphones called mobile Botnet. It can

access to mobile phones and send control to the bot-master to handle such devices remotely to

generate very large-scale attack. Another most recently created Botnet that highly affects the IoT

is Torii Botnet. Torii is the most sophisticated Botnet observed by Avast. It is stealing the IoT

device’s information and allows the attackers to execute code remotely, but this Botnet can

perform other commands with multiple layers of encryption.

This Botnet communicates with the C & C server, and the coder of this Botnet executes and

delivers the payload to compromised devices. Botnet attacks are restricted only to IoT devices,

social networks, and mobile devices, but Botnet can control cloud services as well. With the

growing usage of the Internet for mobile phones, social media sites, and cloud computing, these

threats will set their target in these fields, In this field research is at the initial stage.

Attackers will find loopholes to compromise these fields. Many open-source Botnets are

available. Also, it is straightforward to install like any other open-source software with just basic

knowledge of source code, and anyone can compile this code. Botnet construction kits are

available, so it becomes easy to set up Botnet compared to open source Botnet code, and these

software kits are GUI-based and very user-friendly. One such example is the ZeuS Botnet kit. No

technical skills are needed to generate Botnet attacks.

These kits so specialized and sophisticated to generate 0-day attacks. It may also provide

technical support with these kits and software updates to make malware up-to-date. So there is a

huge scope of research in Botnet detection. Collaborative work is required between governments,

industries, and academics to detect and mitigate of such threats.

89

6.3 References

[1] Feily, M., Shahrestani, A. and Ramadass, S., 2009, June. A survey of Botnet and Botnet

detection. In the 2009 Third International Conference on Emerging Security Information,

Systems and Technologies IEEE pp. 268-273.

[2] Suo, H., Wan, J., Zou, C. and Liu, J., 2012, March. Security in the Internet of Things: a

review. In the 2012 international conference on computer science and electronics

engineering, Vol. 3, IEEE pp. 648-651.

[3] Madakam, S., Ramaswamy, R. and Tripathi, S., 2015. Internet of Things (IoT): A literature

review. Journal of Computer and Communications, 3(05), pp.164.

[4] Perwej, Y., Parwej, F., Hassan, and Akhtar, N., 2019. The Internet-of-Things (IoT) Security:

A Technological perspective and review. International Journal of Scientific Research in

Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN, pp.2456-

3307.

[5] Weber, R.H., 2010. Internet of Things–New security and privacy challenges. Computer law

& security review, 26(1), pp.23-30.

[6] Suo, H., Wan, J., Zou, C. and Liu, J., 2012, March. Security in the internet of things: a

review. In the 2012 international conference on computer science and electronics

engineering Vol. 3, IEEE pp. 648-651.

[7] Whitmore, A., Agarwal, A. and Da Xu, L., 2015. The Internet of Things—A survey of

topics and trends. Information Systems Frontiers, 17(2), pp.261-274.

[8] Ziegeldorf, J.H., Morchon, O.G. and Wehrle, K., 2014. Privacy in the Internet of Things:

threats and challenges. Security and Communication Networks, 7(12), pp.2728-2742.

[9] Liu, J., Xiao, Y. and Chen, C.P., 2012, June. Authentication and access control in the

internet of things. In the 2012 32nd International Conference on Distributed Computing

Systems Workshops (pp. 588-592). IEEE.

[10] E. Bertino and N. Islam,2017, Botnets and Internet of Things security, Computer, 50(2), pp.

76-79.

[11] Angrishi, K., 2017. Turning the internet of things (IoT) into an internet of vulnerabilities

(IoV): IoT Botnet. arXiv preprint arXiv:1702.03681.

90

[12] Kolias, C., Kambourakis, G., Stavrou, A. and Voas, J., 2017. DDoS in the IoT: Mirai and

other Botnets. Computer, 50(7), pp.80-84.

[13] Zarpelao, B.B., Miani, R.S., Kawakani, C.T. and de Alvarenga, S.C., 2017. A survey of

intrusion detection in the Internet of Things. Journal of Network and Computer

Applications, 84, pp.25-37.

[14] Lindqvist, U. and Neumann, P.G., 2017. The future of the Internet of

Things. Communications of the ACM, 60(2), pp.26-30.

[15] Yang, Y., Wu, L., Yin, G., Li, L. and Zhao, H., 2017. A survey on security and privacy

issues in Internet-of-Things. IEEE Internet of Things Journal, 4(5), pp.1250-1258.

[16] Q. Yaseen, M. Aldwairi, Y. Jararweh, M.Al-Ayyoub, B. Gupta, Collusion attacks

mitigation in internet of things: a fog based model, Multimedia Tools and Applications,

pp.1-20, 2017.

[17] Y. Yilmaz, S. Uludag, Mitigating IoT-based cyber attacks on the smart grid, 16th IEEE

International Conference on Machine Learning and Applications (ICMLA), pp. 517-522,

2017.

[18] A. Azab, M. Alazab and M. Aiash, 2016. "Machine learning based Botnet identification

Traffic," IEEE Trustcom/BigDataSE/ISPA, pp. 1788-1794,

[19] Tuan, T.A., Long, H.V., Kumar, R., Priyadarshini, I. and Son, N.T.K., 2019. Performance

evaluation of Botnet DDoS attack detection using machine learning. Evolutionary

Intelligence, pp.1-12.

[20] Vishwakarma, R. and Jain, A.K., 2019, April. A Honeypot with machine learning based

detection framework for defending IoT based Botnet DDoS attacks. In 2019 3rd

International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1019-1024.

IEEE.

[21] Ziegeldorf, J.H., Morchon, O.G. and Wehrle, K., 2014. Privacy in the Internet of Things:

threats and challenges. Security and Communication Networks, 7(12), pp.2728-2742.

[22] D. H. Summerville, K. M. Zach and Y. Chen, 2015. Ultra-lightweight deep packet anomaly

detection for Internet of Things devices, IEEE 34th International Performance Computing

and Communications Conference (IPCCC), pp. 1-8

91

[23] Q. Yan, W. Huang, X. Luo, Q. Gong, and F.R. Yu, A multi-level DDoS mitigation

framework for the industrial Internet of things, IEEE Communications Magazine. 56(2)

(2018) 30-36.

[24] M. Yeo, Y. Koo, Y. Yoon, T. Hwang, J. Ryu, J. Song, and C. Park, Flow-based malware

detection using convolutional neural network, In Information Networking (ICOIN), 2018

International Conference on. (2018) 910-913.

[25] S.W. Park, J. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.J. Yoo, An energy-efficient

and scalable deep learning/inference processor with tetra-parallel MIMD architecture for big

data applications, IEEE transactions on biomedical circuits and systems. 9(6) (2015) 838-

848.

[26] J.A. Jerkins, Motivating a market or regulatory solution to IoT insecurity with the Mirai

Botnet code, In Computing and Communication Workshop and Conference (CCWC), 2017

IEEE 7th Annual. (2017) pp 1-5.

[27] A.O. Prokofiev, Y.S. Smirnova, and V.A. Surov, A method to detect Internet of Things

Botnet. In Young Researchers in Electrical and Electronic Engineering (EIConRus), 2018

IEEE Conference of Russian. (2018) 105-108.

[28] J. Smith-perrone, and J. Sims, Securing cloud, SDN and large data network environments

from emerging DDoS attacks. In Cloud Computing, Data Science & Engineering-

Confluence, 2017 7th International Conference on. (2017) 466-469.

[29] Giachoudis, N., Damiris, G.P., Theodoridis, G. and Spathoulas, G., 2019, Collaborative

agent-based detection of DDoS IoT Botnet. In 2019 15th International Conference on

Distributed Computing in Sensor Systems (DCOSS) (pp. 205-211). IEEE.

[30] Shafi, Q. and Basit, A., 2019, January. DDoS Botnet prevention using blockchain in

software defined Internet of Things. In 2019 16th International Bhurban Conference on

Applied Sciences and Technology (IBCAST) (pp. 624-628). IEEE.

[31] Ahmed, Z., Danish, S.M., Qureshi, H.K. and Lestas, M., 2019, September. Protecting IoTs

from mirai Botnet attacks using blockchains. In 2019 IEEE 24th International Workshop on

Computer Aided Modeling and Design of Communication Links and Networks

(CAMAD) pp. 1-6. IEEE.

92

[32] “Mirai (malware).” Wikipedia, Wikimedia Foundation, 19 Feb. 2019,

https://en.wikipedia.org/wiki/Mirai_(malware).

[33] Q. Yan, W. Huang, X. Luo, F. Richard Yu, A multi-level DDoS mitigation framework for

the industrial Internet of things, IEEE Communications Magazine, Vol.56, No.2, pp.30-36,

2018.

[34] M. Yeo, Y. Koo, Y. Yoon, T. Hwang, J. Ryu, J. Song, C. Park, Flow-based malware

detection using convolution neural network, IEEE Information Networking (ICOIN), pp.

910-913, 2018.

[35] S. Wook Park, J. Park, K. Bong, D. Shin, J. Lee, S. Choi, H.J Yoo, An energy-efficient and

scalable deep learning/inference processor with tetra-parallel MIMD architecture for big

data applications, IEEE transactions on biomedical circuits and systems, Vol. 9, No.6,

pp.838-848, 2015.

[36] J.A Jerkins, Motivating a market or regulatory solution to IoT insecurity with the Mirai

Botnet code,IEEE Computing and Communication Workshop and Conference (CCWC) ,

pp.1-5, 2017.

[37] C. Kolias, G. Kambourakis, A. Stavrou, J. Voas, DDoS in the IoT: Mirai and other Botnet,

IEEE Computer, Vol. 50, No.7, pp.80-84, 2017.

[38] G. Perrone, M. Vecchio, P,R.Pecori, The Day After Mirai: A survey on MQTT security

solutions after the largest cyber-attack carried out through an army of IoT devices, Second

International Conference on Internet of Things, Big Data and Security PP.246-253, 2017.

[39] J. Smith-perrone, J. Sims, Securing cloud, SDN and large data network environments from

emerging DDoS attacks, 7th

IEEE International Conference on Cloud Computing, Data

Science & Engineering-Confluence, pp. 466-469, 2017.

[40] A. Stanciu,T.C Balan, C. Gerigan,S. Zamfir, Securing the IoT gateway based on the

hardware implementation of a multi pattern search algorithm, IEEE Optimization of

Electrical and Electronic Equipment (OPTIM) & Aegean Conference on Electrical

Machines and Power Electronics (ACEMP), pp. 1001-1006, 2017.

[41] A. Stanciu, T.C. Balan, C. Gerigan, and S. Zamfir, Securing the IoT gateway based on the

hardware implementation of a multi pattern search algorithm, In Optimization of Electrical

and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines

and Power Electronics (ACEMP), 2017 International Conference on. (2017) 1001-1006.

93

[42] Q. Yaseen, M. Aldwairi, Y. Jararweh, M. Al-Ayyoub, and B. Gupta, Collusion attacks

mitigation in internet of things: a fog based model. Multimedia Tools and Applications,

(2017) 1-20.

[43] Y. Yilmaz, and S. Uludag, Mitigating IoT-based Cyber attacks on the smart grid, in machine

learning and applications (ICMLA), 2017 16th IEEE International Conference on. (2017)

517-522.

[44] M.E. Ahmed, H. Kim, and M. Park, Mitigating DNS query-based DDoS attacks with

machine learning on software-defined networking. In Military Communications Conference

(MILCOM), MILCOM 2017-2017 IEEE. (2017) 11-16.

[45] M. Stevanovic, and J.M. Pedersen, Machine learning for identifying Botnet network

traffic. Networking and Security Section, Department of Electronic Systems, Aalborg

University, Tech. Rep. (2013).

[46] T. Zhu, S. Dhelim, Z. Zhou, S. Yang, and H. Ning, An architecture for aggregating

information from distributed data nodes for industrial Internet of Things, Computers &

Electrical Engineering. 58 (2017) 337-349.

[47] Jeon, J. and Cho, Y., 2019. Construction and performance analysis of image steganography

based Botnet in Kakao Talk Openchat. Computers, 8(3), p.61.

[48] H.R. Zeidanloo, A.B. Manaf, P. Vahdani, F. Tabatabaei, and M. Zamani, Botnet detection

based on traffic monitoring. Networking and Information Technology (ICNIT), 2010

International Conference on. (2010) 97-10.

[49] M. Chatterjee, A. S. Namin and P. Datta, 2018,Evidence Fusion for malicious Bot detection

in IoT, IEEE International Conference on Big Data (Big Data), 2018, pp. 4545-4548.

[50] Adat, V. and Gupta, B.B., 2018. Security in Internet of Things: issues, challenges,

taxonomy, and architecture. Telecommunication Systems, 67(3), pp.423-441

[51] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, D. Breitenbacher, A. Shabtai, Y. Elovici,

N-BaIoT: Network-based detection of IoT Botnet attacks using deep auto encoders, IEEE

Pervasive Computing Vol.17 , No.3, 2018.

[52] C. McDermott, F. Majdani, A. V Petrovski, Botnet detection in the internet of things using

deep learning approaches, IEEE International Joint Conference on Neural Networks

(IJCNN), 2018.

94

[53] Y. Meidan, M. Bohadana, A. Shabtai, J. David Guarnizo, M. Ochoa, N. Ole Tippenhauer,

Y. Elovici, ProfilIoT: A machine learning approach for IoT device identification based on

network traffic analysis, In Proceedings of the Symposium on Applied Computing, pp. 506-

509, 2017.

[54] S. Homayoun, M. Ahmadzadeh, S. Hashemi, A. Dehghantanha, R. Khayami, BoTShark: A

deep learning approach for Botnet traffic detection, Cyber Threat Intelligence, pp. 137-153,

2018.

[55] N. An, A. Duff, G. Naik. M, M. Faloutsos, S. Weber, S. Mancoridis, Behavioral anomaly

detection of malware on home routers, IEEE 12th International Conference on Malicious

and Unwanted Software (MALWARE), pp. 47-54, 2017.

[56] L. Mathur, M. Raheja, P. Ahlawat, Botnet detection via mining of network traffic flow,

International Conference on Computational Intelligence and Data Science (ICCIDS 2018),

pp. 1668-1678, 2018.

[57] F. Villegas Alejandre, N. Cruz Cortes, E. Aguirre Anaya, Feature selection to detect Botnet

using machine learning algorithms, IEEE International Conference on Electronics,

Communications and Computers (CONIELECOMP), 2017.

[58] A. Bijalwan, N. Chand, E. Shubhakar Pilli, C. R. Krishna, Botnet analysis using ensemble

classifier, Perspectives in Science , pp. 502—504, 2016.

[59] F. Villegas Alejandre, N. Cruz Cortes, E. Aguirre Anaya, Feature selection to detect Botnet

using machine learning algorithms, IEEE International Conference on Electronics,

Communications and Computers (CONIELECOMP), 2017.pp.1-7.

[60] S. Miller and C. Busby-Earle, "The role of machine learning in Botnet detection," 2016 11th

International Conference for Internet Technology and Secured Transactions (ICITST), 2016,

pp. 359-364.

[61] C.Hammer schmidt, S.Marchal, R. State, S. Verwer, Behavioral clustering of non-stationary

IP flow record data, 12th International Conference on Network and Service Management,

CNSM 2016 and Workshops, 3rd International Workshop on Management of SDN and

NFV, ManSDN/NFV 2016, and International Workshop on Green ICT and Smart

Networking, GISN 2016, pp. 297–301,2016.

95

[62] G. Kirubavathi Venkatesh, R. AnithaNadarajan, HTTP Botnet detection using adaptive

learning rate multilayer feed-forward neural network, IFIP International Workshop on

Information Security Theory and Practice, Springer, Berlin, Heidelberg, pp. 38-48 ,2012.

[63] K. Singh , S. Chandra Guntuku, A. Thakur, C. Hota, Big data analytics framework for

peer-to-peer Botnet detection using random forests, Information Sciences, pp.488-497, 2014.

[64] Y. Meidan et al., 2018, N-BaIoT—Network-Based detection of IoT Botnet Attacks Using

Deep Autoencoders, in IEEE Pervasive Computing, vol. 17, no. 3, pp. 12-22.

[65] McDermott, D. Christopher Farzan Majdani, and Andrei Petrovski, Botnet detection in the

Internet of Things using deep learning approaches (2018).

[66] Meidan, Yair, Michael Bohadana, Asaf Shabtai, Juan David Guarnizo, Martín Ochoa, Nils

Ole Tippenhauer, and Yuval Elovici, ProfilIoT: A machine learning approach for IoT device

identification based on network traffic analysis, In Proceedings of the Symposium on

Applied Computing. (2017) 506-509.

[67] Homayoun, Sajad, Marzieh Ahmadzadeh, Sattar Hashemi, Ali Dehghantanha, and Raouf

Khayami, BoTShark: A deep learning approach for Botnet traffic detection, Cyber Threat

Intelligence. (2018) 137-153.

[68] F. Shaikh, E. Bou-Harb, J. Crichigno, N. Ghani, A machine learning model for classifying

unsolicited IoT devices by observing network telescopes, IEEE International Wireless

Communications and Mobile Computing Conference (IWCMC 2018) ,2018.

[69] K. Singh , S. Chandra Guntuku, A. Thakur, C. Hota, Big data analytics framework for

peer-to-peer Botnet detection using random forests, Information Sciences, pp.488-497, 2014.

[70] Meidan, Yair, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Dominik Breitenbacher,

Asaf Shabtai, and Yuval Elovici, N-BaIoT: Network-based detection of IoT Botnet attacks

using deep autoencoders. arXiv preprint arXiv:1805.03409 (2018).

[71] Chatterjee, M., Namin, A.S. and Datta, P., 2018, December. Evidence Fusion for Malicious

Bot Detection in IoT. In 2018 IEEE International Conference on Big Data (Big Data) (pp.

4545-4548). IEEE.

[72] Sengar, B. and Padmavathi, B., 2017, July. P2P bot detection system based on map

reduces. In 2017 International Conference on Computing Methodologies and

Communication (ICCMC) (pp. 627-634). IEEE.

96

[73] Gadelrab, M.S., ElSheikh, M., Ghoneim, M.A. and Rashwan, M., 2018. BotCap: Machine

learning approach for Botnet detection based on statistical features. International Journal of

Communication Networks and Information Security, 10(3), p.563.

[74] Hammerschmidt, C., Marchal, S., State, R., and Verwer, S. 2017. Behavioral clustering of

non-stationary IP flow record data. 2016 12th International Conference on Network and

Service Management, CNSM 2016 and Workshops, 3rd International Workshop on

Management of SDN and NFV, ManSDN/NFV 2016, and International Workshop on Green

ICT and Smart Networking, GISN 2016, pages 297–301.

[75] C., Marchal, S., State, R., Pellegrino, G., and Verwer, S. 2016. Efficient learning of

communication profiles from IP flow records. Proceedings - Conference on Local Computer

Networks, LCN, pages 559–562.

[76] Ijaz, S., Hashmi, F. A., Asghar, S., and Alam, M. 2017. Vector Based Genetic Algorithm to

optimize predictive analysis in network security. Applied Intelligence.

[77] Chen, W., Luo, X., and Zincir-Heywood, A. N. 2017. Exploring a service-based normal

behaviour profiling system for Botnet detection. In IFIP/IEEE Symposium on Integrated

Network and Service Management (IM), pages 947–952.

[78] Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino, 2014. An empirical comparison of

botnet detection methods Sebastian. Computers and Security Journal, Elsevier. Vol 45, pp

100-123.

[79] Wang, J. and Paschalidis, I. C. 2016. Botnet Detection based on anomaly and community

detection. IEEE Transactions on Control of Network Systems, 5870(c):1–1.

[80] Tzagkarakis, C., Petroulakis, N. and Ioannidis, S., 2019, June. Botnet attack detection at the

IoT edge based on sparse representation. In 2019 Global IoT Summit (GIoTS) (pp. 1-6).

IEEE.

[81] Herwig, S., Harvey, K., Hughey, G., Roberts, R. and Levin, D., 2019, February.

Measurement and analysis of hajime, a Peer-to-peer IoT Botnet. In NDSS.

[82] Garip, M.T., Reiher, P. and Gerla, M., 2019, September. RIoT: A rapid exploit delivery

mechanism against IoT devices using vehicular Botnet. In 2019 IEEE 90th Vehicular

Technology Conference (VTC2019-Fall) (pp. 1-6). IEEE.

97

[83] Banerjee, M. and Samantaray, S.D., 2019. Network traffic analysis based IoT Botnet

detection using Honeynet data applying classification techniques. International Journal of

Computer Science and Information Security (IJCSIS), 17(8).

[84] Nguyen, H.T., Nguyen, D.H., Ngo, Q.D., Tran, V.H. and Le, V.H., 2019, Towards a rooted

subgraph classifier for IoT Botnet detection. In Proceedings of the 2019 7th International

Conference on Computer and Communications Management, ACM. pp. 247-251

[85] Ceron, J.M., Steding-Jessen, K., Hoepers, C., Granville, L.Z. and Margi, C.B., 2019.

Improving IoT Botnet investigation using an adaptive network layer. Sensors, 19(3), p.727.

[86] Yin, L., Luo, X., Zhu, C., Wang, L., Xu, Z. and Lu, H., 2019. ConnSpoiler: Disrupting

C&C communication of IoT-Based Botnet through fast detection of anomalous domain

queries. IEEE Transactions on Industrial Informatics.

[87] Koroniotis, N., Moustafa, N., Sitnikova, E. and Turnbull, B., 2019. Towards the

development of realistic Botnet dataset in the internet of things for network forensic

analytics: Bot-IoT dataset. Future Generation Computer Systems, 100, pp.779-796.

[88] Farooq, M.J. and Zhu, Q., 2019. Modeling, analysis, and mitigation of dynamic Botnet

formation in wireless IoT networks. IEEE Transactions on Information Forensics and

Security, 14(9), pp.2412-2426.

[89] Alhajri, R., Zagrouba, R. and Al-Haidari, F., 2019. Survey for anomaly detection of IoT

Botnet using machine learning uuto-encoders. International Journal of Applied Engineering

Research, 14(10), pp.2417-2421.

[90] Spathoulas, G., Giachoudis, N., Damiris, G.P. and Theodoridis, G., 2019. Collaborative

Blockchain-based detection of distributed denial of service attacks based on Internet of

Things Botnet. Future Internet, 11(11), p.226.

[91] Vishwakarma, R. and Jain, A.K., 2019, April. A Honeypot with machine learning based

detection framework for defending IoT based Botnet DDoS Attacks. In 2019 3rd

International Conference on Trends in Electronics and Informatics (ICOEI) , IEEE, pp.

1019-1024.

[92] Yin, M., Chen, X., Wang, Q., Wang, W. and Wang, Y., 2019. Dynamics on hybrid complex

network: Botnet modeling and analysis of medical IoT. Security and Communication

Networks, 2019.

98

[93] Bezerra, V.H., da Costa, V.G.T., Barbon Junior, S., Miani, R.S. and Zarpelão, B.B., 2019.

IoTDS: A One-class classification approach to detect Botnet in Internet of Things

devices. Sensors, 19(14), pp.3188.

[94] Pour, M.S., Mangino, A., Friday, K., Rathbun, M., Bou-Harb, E., Iqbal, F., Shaban, K. and

Erradi, A., 2019, August. Data-driven curation, learning and analysis for inferring evolving

IoT Botnet in the wild. In Proceedings of the 14th International Conference on Availability,

Reliability and Security ACM pp.6.

[95] Soe, Y.N., Feng, Y., Santosa, P.I., Hartanto, R. and Sakurai, K., 2019. Rule generation for

signature based detection systems of cyber attacks in IoT environments. Bulletin of

Networking, Computing, Systems, and Software, 8(2), pp.93-97.

[96] KoronIotis, N., Moustafa, N. and Sitnikova, E., 2019. Forensics and deep Learning

mechanisms for Botnet in Internet of Things: A Survey of Challenges and Solutions. IEEE

Access, 7, pp.61764-61785.

[97] Shafi, Q. and Basit, A., 2019, January. DDoS Botnet Prevention using Blockchain in

Software Defined Internet of Things. In 2019 16th International Bhurban Conference on

Applied Sciences and Technology (IBCAST) (pp. 624-628). IEEE.

[98] Lange, T. and Kettani, H., 2019, March. On Security Threats of Botnet to cyber systems.

In 6th International Conference on Signal Processing and Integrated Networks (SPIN) (pp.

176-183). IEEE.

[99] R. K. Malaiya, D. Kwon, J. Kim, S. C. Suh, H. Kim and I. Kim, An Empirical evaluation of

deep learning for network anomaly detection, International Conference on Computing,

Networking and Communications (ICNC), 2018, pp. 893-898.

[100] Rahul Saxena, Introduction to K-nearest neighbor classifier , https://dataaspirant.com/k-

nearest-neighbor-classifier-intro/, last accessed on 05 July 2021.

99

List of Publications

1.Priyang Bhatt, Bhaskar Thakker, "A NOVEL FORECASTIVE ANOMALY BASED

BOTNET REVELATION FRAMEWORK FOR COMPETING CONCERNS IN INTERNET

OF THINGS", Journal of Applied Security Research , Taylor & Francis, volume 16, issue 2,

pp.258-278,2021 (ESCI & SCOPUS Indexed).

2.Priyang Bhatt, Bhaskar Thakker, "ISOLATING BOTNET ATTACKS USING BOOTSTRAP

AGGREGATING SURFLEX-PSIM CLASSIFIER IN IOT", Journal of Intelligent & Fuzzy

Systems, IOS Press, volume 38, issue 2, pp.1827-1840, 2020 (ACM Digital Library, SCI &

SCOPUS Indexed)

3. Priyang Bhatt, Bhaskar Thakker, "MASS REMOVAL OF BOTNET ATTACKS USING

HETEROGENEOUS ENSEMBLE STACKING PROSIMA CLASSIFIER IN IOT", Journal of

Communication Networks and Information Security (IJCNIS), KUST, volume 11, issue 3,

pp.380-390, 2019 (SCOPUS Indexed).

100

Appendix

1. List of hardware and software components used for experimental setup

The proposed method is evaluated with the experimental setup. The traffic is collected from 20

IoT real nodes (implemented with Raspberry pi 3) connected via the WI-FI network to the access

point and wired connection to the central switch and the router. Using Tcpdump, Tshark, and

Wireshark, the network traffic is sniffed, port mirroring on the switch has been utilized for

sniffing. C & C (Command & Control) has been achieved using a python script to send the file

and control IoT devices. Three IoT devices are configured as bots to generate DDoS and Spam

attacks to the rest of the devices in the network.

Here is information regarding hardware and software are used in the experimental setup.

1.1 Hardware Components

1.1.1. Raspberry Pi 3

To collect traffic from IoT nodes, Linux-based IoT devices have been used. In the experimental

setup, we used 20 Raspberry PI 3 as a Linux-based IoT device to collect network traffic and

inject the attack and detect the Botnet.

Link: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/

1.1.2. CISCO SYSTEMS 8-Port Gigabit Ethernet Desktop Switch (SG110D08NA)

In the experimental setup, twenty Raspberry Pi 3 connected via the WI-FI network to the access

point and wired connection to the central switch and also to the router. All the servers are

connected with a central switch and switch connected with the access point.

101

Link: https://www.amazon.in/SYSTEMS-Gigabit-Ethernet-Desktop

SG110D08NA/dp/B00V8IZ7JM

1.1.3. Laptops

Laptops are utilized as servers to record the data. DHCP Server to generate IP addresses

dynamically in the network. C & C (command & control) sever to send the file and to control

IoT devices. In the experimental setup total of four laptops are used to serve as different servers.

Link: https://www.gadgetsnow.com/laptops/Dell-Inspiron-15-3593-D560159WIN9S-Laptop-

Core-i3-10th-Gen8-GB1-TBWindows-10

1.1.4. D-Link DAP-1360 Wireless N Access Point

In the experimental setup, the traffic is collected from 20 IoT real nodes (implemented with

Raspberry pi 3) connected via the WI-FI network to the access point and wired connection to the

central switch and the router.

102

Link : https://eu.dlink.com/-/media/consumer_products/dap/dap-1360/manual/dap-

1360_c1_manual_v3_00_eu.pdf

1.2 Software components

1.2.1 Anaconda (Spyder) python 3

In the proposed work Anaconda IDE was used for implementation. Anaconda is a distribution of

Python and R for Data Analytics, Machine learning, Scientific computing, etc. Anaconda is a

Software package containing various IDEs and Applications like Spyder, Jupyter, and Visual

Studio Code. With this, it comes equipped with a package manager conda. Conda is an open-

source, cross-platform, package, and environment manager that aids in executing and installing

of packages and dependencies. Spyder is an Integrated Development Environment for Python

designed for Scientific Computing. It is included in the Anaconda package Distribution. It

enables users with the capabilities of Advanced Code Editing, Data Visualization, and

Debugging.

Link : https://docs.anaconda.com/anaconda/install/windows/

1.2.2 Wireshark/Tshark

Wireshark is used for sniffing network traffic. Wireshark (Formally, Ethereal) is an open-source

packet analyzer. It is used for computer network management and debugging. Wireshark is a

graphical tool developed using the Qt widget toolkit of C++, and at its core for the packet,

management uses pcap. Wireshark can be used on Linux, UNIX, and Windows Based systems.

Wireshark's operations are like those that the user can perform using the tcpdump command on a

Linux/Unix-based system. A primary advantage that it provides over the tcpdump command is

that the packets are displayed in color-coding. The contents of the packets can be viewed in

different encoding formats. It also aids in putting the supportable network interfaces into

promiscuous mode without manually changing them. There is also a command-line/Terminal

based version available with the name of Tshark.

103

1.2.3 Python Scapy

Python scapy is used to generate and detect security attacks. Formally, Scapy is a python library

for network management. It enables the developers to send, receive, sniff, and dissect the packets

in a network. Scapy supports a wide range of communication protocols. The developers can use

it to undertake the tasks like scanning, trace routing, probing, unit tests, attacks, and network

discovery replacing many well-known Linux/Unix-based commands like Nmap, arpspoof,

arping, etc.

Link: https://scapy.readthedocs.io/en/latest/introduction.html

1.2.4 tcpdump

Used for sniffing network traffic. tcpdump is a Linux/Unix-based command, which provides

network-analysis capabilities. tcpdump is a command-line-based tool, and it primarily works on

a TCP/IP networking stack. It printouts a description of the packets traveling on a network.

tcpdump also accepts Boolean Expressions, which helps in packet filtering based on the

protocols, timestamps, and packet numbers.

Link: https://opensource.com/article/18/10/introduction-tcpdump

2. Some of the Tshark and tcpdump command that we have used for traffic generation and

sniffing the traffic. We have used these commands in the isolated experimental setup.

Description Commands

Capturing packets with

tshark.

tshark –i wlan0 –w output_file.pcap

Reading a Pcap file tshark –r output_file.pcap

Generic Capture for an IP

Address.

tshark -R “ip.addr == 192.168.1.10” -r output_file.pcap

Send specified number of

packets.

tshark –M 100000

Creating CSV file with

Tshark

tshark -r output_file.pcap -T fields -e frame.number -e frame.time -e

eth.src -e eth.dst -e ip.src -e ip.dst -e ip.proto -E header=y -E

separator=, -E quote=d -E occurrence=f > dataset.csv

Display only source and

destination IP

tshark -o column.format: ’”Source”, “%s”, ”Destination”, “%d”‘

–Ttext

104

Display HTTP Responses tshark -o “tcp.desegment_tcp_streams:TRUE” -i eth0 -R

“http.response” -T fields -e http.response.code

Capturing N number of

packets using tcpdump

tcpdump –c N –i interface_name

Capture and save packet in

file using tacpdump

tcpdump –w filename.pcap –i interface_name

Read captured packet using

tcpdump

tcpdump –r filename.pcap

Capture packet from

specific port

tcpdump –i interface_name port port_number

Links:

1. https://www.cellstream.com/reference-reading/tipsandtricks/272-t-shark-usage-examples

2. https://www.wireshark.org/docs/man-pages/tshark.html

3. https://opensource.com/article/20/1/wireshark-linux-tshark

gujarat technological university (gtu) ahmedabad …

Documents