research articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · research article smart...

16
Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco Sales de Lima Filho , 1 Frederico A. F. Silveira , 1 Agostinho de Medeiros Brito Junior, 1 Genoveva Vargas-Solar, 2 and Luiz F. Silveira 1 1 Computer Engineering and Automation Department, Federal University of Rio Grande do Norte, P.O. Box 1524, Natal 59078-970, Brazil 2 Centre National de la Recherche Scientifique LIG, Grenoble, France Correspondence should be addressed to Francisco Sales de Lima Filho; salesfi[email protected] Received 11 June 2019; Accepted 31 August 2019; Published 13 October 2019 Academic Editor: Leandros Maglaras Copyright © 2019 Francisco Sales de Lima Filho et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Users and Internet service providers (ISPs) are constantly affected by denial-of-service (DoS) attacks. is cyber threat continues to grow even with the development of new protection technologies. Developing mechanisms to detect this threat is a current challenge in network security. is article presents a machine learning- (ML-) based DoS detection system. e proposed approach makes inferences based on signatures previously extracted from samples of network traffic. e experiments were performed using four modern benchmark datasets. e results show an online detection rate (DR) of attacks above 96%, with high precision (PREC) and low false alarm rate (FAR) using a sampling rate (SR) of 20% of network traffic. 1. Introduction In recent years, distributed denial-of-service (DDoS) attacks have caused significant financial losses to industry and governments worldwide, as shown in information security reports [1]. ese records are in line with the growing number of devices connected to the Internet, especially driven by the popularization of ubiquitous computing, materialized through the Internet of ings (IoT) [2] par- adigm and characterized by the concept of connecting anything, anywhere, anytime. In most Internet scenarios, devices interact with applications that run remotely on the network, which enables malicious agents to take control of devices. In this way, it is possible to have the interruption of services or the use of devices as a launching point of attacks for diverse domains, as is the case of the DDoS attack [3], which has been consolidated for several reasons, such as (i) simplicity and facility of execution, not requiring vast technological knowledge on the attacker side, and (ii) variety of platforms and applications for facilitated attack orches- tration. Many of these attacks succeeded in disrupting essential Internet services such as DNS, affecting millions of users around the world [4], and commercial platforms such as the GitHub [5], prompting severe financial losses to the organizations that depend on those services. One of the most dangerous malicious traffic on the Internet is the DDoS volumetric attack, which is responsible for more than 65% of all such attacks [6]. In a volumetric DDoS attack, several attackers coordinate the sending of a high rate of useless data in an attempt to overload the victim’s computing resources or the near network links. On the one hand, the high success rates for this type of attack occur because the main Internet routers typically use the FIFO (First-In-First-Out) and DROP-TAIL queuing disci- plines, which do not differentiate between types of traffic, imposing equal loss rates for attacks and legitimate traffic. Although legitimate traffic tends to retreat to prevent further congestion, attack traffic does not have this commitment and causes the links to be exceeded. As a consequence, legitimate traffic is also obstructed [6]. On the other hand, the attackers are using more advanced techniques to potentiate attacks and flood the victim such as DDoS-for-hire, IoT-based Hindawi Security and Communication Networks Volume 2019, Article ID 1574749, 15 pages https://doi.org/10.1155/2019/1574749

Upload: others

Post on 20-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

Research ArticleSmart Detection An Online Approach for DoSDDoS AttackDetection Using Machine Learning

Francisco Sales de Lima Filho 1 Frederico A F Silveira 1

Agostinho de Medeiros Brito Junior1 Genoveva Vargas-Solar2 and Luiz F Silveira 1

1Computer Engineering and Automation Department Federal University of Rio Grande do Norte PO Box 1524Natal 59078-970 Brazil2Centre National de la Recherche Scientique LIG Grenoble France

Correspondence should be addressed to Francisco Sales de Lima Filho saleslhogmailcom

Received 11 June 2019 Accepted 31 August 2019 Published 13 October 2019

Academic Editor Leandros Maglaras

Copyright copy 2019 Francisco Sales de Lima Filho et al is is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in anymedium provided the original work isproperly cited

Users and Internet service providers (ISPs) are constantly aected by denial-of-service (DoS) attacks is cyber threat continuesto grow even with the development of new protection technologies Developing mechanisms to detect this threat is a currentchallenge in network security is article presents a machine learning- (ML-) based DoS detection system e proposedapproach makes inferences based on signatures previously extracted from samples of network trac e experiments wereperformed using four modern benchmark datasetse results show an online detection rate (DR) of attacks above 96 with highprecision (PREC) and low false alarm rate (FAR) using a sampling rate (SR) of 20 of network trac

1 Introduction

In recent years distributed denial-of-service (DDoS) attackshave caused signicant nancial losses to industry andgovernments worldwide as shown in information securityreports [1] ese records are in line with the growingnumber of devices connected to the Internet especiallydriven by the popularization of ubiquitous computingmaterialized through the Internet of ings (IoT) [2] par-adigm and characterized by the concept of connectinganything anywhere anytime In most Internet scenariosdevices interact with applications that run remotely on thenetwork which enables malicious agents to take control ofdevices In this way it is possible to have the interruption ofservices or the use of devices as a launching point of attacksfor diverse domains as is the case of the DDoS attack [3]which has been consolidated for several reasons such as (i)simplicity and facility of execution not requiring vasttechnological knowledge on the attacker side and (ii) varietyof platforms and applications for facilitated attack orches-tration Many of these attacks succeeded in disrupting

essential Internet services such as DNS aecting millions ofusers around the world [4] and commercial platforms suchas the GitHub [5] prompting severe nancial losses to theorganizations that depend on those services

One of the most dangerous malicious trac on theInternet is the DDoS volumetric attack which is responsiblefor more than 65 of all such attacks [6] In a volumetricDDoS attack several attackers coordinate the sending of ahigh rate of useless data in an attempt to overload thevictimrsquos computing resources or the near network links Onthe one hand the high success rates for this type of attackoccur because the main Internet routers typically use theFIFO (First-In-First-Out) and DROP-TAIL queuing disci-plines which do not dierentiate between types of tracimposing equal loss rates for attacks and legitimate tracAlthough legitimate trac tends to retreat to prevent furthercongestion attack trac does not have this commitment andcauses the links to be exceeded As a consequence legitimatetrac is also obstructed [6] On the other hand the attackersare using more advanced techniques to potentiate attacksand centood the victim such as DDoS-for-hire IoT-based

HindawiSecurity and Communication NetworksVolume 2019 Article ID 1574749 15 pageshttpsdoiorg10115520191574749

DDoS attacks and reflection DDoS attacks [7ndash9] profitingfrom the computational capability and geographical dis-tribution promoted by the wide variety of devices and itsdiverse mobility patterns typically founded in IoT andmobile IoT scenarios

In addition to the volumetric DDoS attack low-volumeattacks are on the radar of security experts It is a moresneaky attack that uses few invading hosts events are fastsometimes lasting only a few minutes and usually less thanan hour For these reasons security teams do not know thattheir sites are under attack because common tools do notdetect this type of threat [10] Typically low-volume DDoSexploits application layer protocols respects other protocolsdoes not overload links but causes exhaustion of victimresources

11 Problem Statements DDoS detection and mitigationhave been under study in both the scientific community andindustry for several years (e related literature reveals thatseveral studies have undertaken to propose solutions to dealwith this problem in a general way [6 11ndash15] Anothergroup of works dedicated themselves to presenting specificsolutions for high-volume and low-volume DDoS attacks[8 13 16] Furthermore despite the diverse recommen-dations for mitigating DDoS attacks proposed by theComputer Emergency Response Team (CERT) and guide-lines documented through Request for Comments (RFC)these attacks still occur with high frequency

A study carried out years ago [17] revealed that theineffectiveness of detecting and mitigating DDoS attacks isdirectly related to constant configuration errors and wastedtime due to the lack of tools that follow the dynamics of thenetwork without constant human interference (is has ledresearchers to use autonomous solutions that can operate(detect and mitigate) based on the behavior and charac-teristics of the traffic In this sense the adoption of solutionswith techniques based on artificial intelligence mainlymachine learning (ML) has been distinguished by offeringhigh flexibility in the classification process consequentlyimproving the detection of malicious traffic [18 19]

(e industrial sector offers DDoS protection as a servicethrough large structures usually operated by specializedproviders [6] such as Akamai Cloudflare and Arbor Net-works which have large processing capacity and proprietaryfiltering mechanisms But the industry also has problemssuch as fragility in routing traffic usually via Domain NameSystem (DNS) or Border Gateway Protocol (BGP) difficultydetecting slow attacks and privacy issues which drive awaysome customer segments as governments

Finding the balance between academic propositions andthe industrial practice of combating DDoS is a big challenge(e academy invests in techniques such as machine learning(ML) and proposes to apply them in areas such as DDoSdetection in Internet of (ings (IoT) [20 21] sensorswireless sensors [22] cloud computing [23] and software-defined networking (SDN) [18] and work on producingmore realistic datasets [24 25] and more effective means ofresult validation [26 27] On the other hand industry

segments gradually invested in new paradigms in theirsolutions such as network function virtualization (NFV) andSDN [28 29] to apply scientific discoveries and modernizenetwork structures Even so the incidents of DDoS stillhappen daily reinforcing that the problem is not solved

12 Proposal Realizing these issues this article proposesSmart Detection a novel defense mechanism against DDoSattacks (e system architecture was designed to detect bothhigh- and low-volume DDoS attacks (e proposed systemacts as a sensor that can be installed anywhere on thenetwork and classifies online traffic using an MLA-basedstrategy that makes inferences utilizing random trafficsamples collected on network devices via stream protocol(e proposed approach is compatible with the Internetinfrastructure and does not require software or hardwareupgrades Besides user data privacy is guaranteed at allstages of system operation

13 Contributions In summary the significant contribu-tions of Smart Detection are as follows

(i) (e modeling development and validation of thedetection system are done using a customized datasetand other three well-known ones called CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 where thesystem receives online random samples of networktraffic and classifies them as DoS attacks or normal

(ii) (e proposed detection system differs from otherapproaches by early identification of a variety ofvolumetric attacks such as TCP flood UDP floodand HTTP flood as well as stealth attacks such asHTTP slow headers HTTP slow body and HTTPslow read even with a low traffic sampling rate Inaddition Smart Detection is compatible with thecurrent Internet infrastructure and does not requireany software or hardware upgrades on ISPs At thesame time the proposed system uses advancedtechnologies such as ML and NFV

(iii) Unlike existing security service providers theproposed system does not require traffic redirectionor connection intermediation Data privacy isguaranteed at all stages First the system randomlyprocesses only a small part of the network trafficSecond it does not do packet deep inspectionInstead Smart Detection parses only network layerheader data

(iv) (e pattern recognition of normal network trafficand several DoS attack types are addressed As aresult a novel signature database is created which isused by Smart Detection and can be applied to othersystems

(v) An approach to automatic feature selection has beendeveloped using the cross-validation technique formodel searches that meet specific classificationquality criteria(is approach was used to define thesignatures adopted by Smart Detection

2 Security and Communication Networks

2 Related Works and Background

(e research on intrusion detection in computer networks iswidely discussed in the literature Several detection tech-niques and protection strategies have been proposed inrecent years Studies in the literature classify the IDSs assignature-based anomaly-based and hybrid systems (efirst type identifies potential attacks by comparing currentobserved events with its stored signatures (e second onedetects anomalies by identifying significant deviations be-tween the preestablished normal profile and the currentevents In all cases an alert will be generated if any signatureis matched or if a deviation occurs above a set threshold(emain advantage of the signature-based approach is the lowfalse alarm rate However the challenge is to write signaturesthat cover all possible attack variations In contrast theanomaly-based approach has the ability to detect unknownattacks but it requires more computational resources andoften produces more false alarms Hybrid solutions try toexploit the benefits of both techniques [11 30] DoS attacksare a specific type of network intrusion that has drawn theattention of academia as highlights recent surveys onnetwork applications wireless networks cloud computingand big data [8 13 14 31]

Several classification strategies of DDoS attacks havebeen proposed in the literature in the last decade HoweverDDoS flooding attacks have been further studied beingclassified into two categories based on the protocol level thatis targeted [3]

(i) Networktransport-level DDoS flooding attacks suchattacks are mostly launched using Transmission Con-trol Protocol (TCP) User Datagram Protocol (UDP)Internet Control Message Protocol (ICMP) and Do-main Name System (DNS) protocol packets

(ii) Application-level DDoS flooding attacks such at-tacks are focused on disrupting legitimate userservices by exhausting the server resources egsockets central processing unit (CPU) memorydiskdatabase bandwidth and inputoutput (IO)bandwidth Application-level DDoS attacks gener-ally consume less bandwidth and are stealthier innature than volumetric attacks since they are verysimilar to benign traffic

(e biggest challenge in combating DDoS attacks lies inthe early detection and mitigation of attacks as close aspossible to their origin however the implementation of acomprehensive solution that addresses these features has notyet been achieved [3 32]

Some recent work has inspired the development of theSmart Detection system (ese approaches are listed inTable 1 for comparative purposes

A Hypertext Transfer Protocol- (HTTP-) based tech-nique [16] was proposed to detect flood attacks in webservers using data sampling (e authors used the CUM-SUM algorithm to determine whether the analyzed traffic isnormal or a DoS attack by focusing on two features thenumber of application layer requests and the number of

packets with payload size equal to zero (e results showed adetection rate between 80 and 88 using a sampling rate of20 Although it has made important advances the pro-posed method does not seem applicable in automatic mit-igation systems especially in production environments thatdo not support high sampling rates

D-FACE is a collaborative defense system [34] that uses ageneralized entropy (GE) and generalized informationdistance (GID) metrics in detecting different types of DDoSattacks and flash events (FEs) In this context an FE issimilar to a volumetric DDoS wherein thousands of legiti-mate users try to access a particular computing resourcesuch as a website simultaneously (e results show thatD-FACE can detect DDoS attacks and FEs Although thework presents relevant contributions the validation usedobsolete datasets In addition the proposed collaborationapproach requires a high degree of ISP engagement so itrestricts the industrial use of the solution

Antidose system [33] presents a means of interactionbetween a vulnerable peripheral service and an indirectlyrelated Autonomous System (AS) which allows the AS toconfidently deploy local filtering rules under the control ofthe remote service (e system was evaluated using Mininetbut no benchmark dataset was used (e approach proposedby the authors faces strong resistance from ISPs for tworeasons the first is software and hardware update re-quirement and the second is having no control over localtraffic control policies

(e SkyShield system [35] has been proposed to detect andmitigate DDoS attacks on the application layer In the de-tection phase SkyShield exploits the divergence between twohash tables (Sketches) to detect anomalies caused by attackerhosts (e mitigation phase uses filtering whitelistingblacklisting and CAPTCHA as protection mechanisms (esystemwas evaluated using customized datasets SkyShield hasfocused on the application layer more specifically on theHTTP protocol so the proposed system is vulnerable toflooding at the network layer and transport layer

Umbrella [36] develops a multilayered defense archi-tecture to defend against a wide spectrum of DDoS attacks(e authors proposed an approach based on detection andprotection exclusively on the victimrsquos side (e system was

Table 1 Recent related works

References Dataset Online LHDoS Sampling

[16] CIC-DoS 7 [33] None 7 7

[34] MIT Lincoln FIFA98DDoSTB CAIDA 7

[35] Customized (developedby the authors) 7

[36] Customized (developedby the authors) 7

[37] CICIDS2017 7 7

(eproposedapproach

CIC-DoS CICIDS2017CSE-CIC-IDS2018

Customized

Security and Communication Networks 3

evaluated using the customized testbed in terms of trafficcontrol (e authors claim that the system is capable ofdealing with mass attacks However this approach is widelyused in industry and proved to be inefficient against trulymassive DDoS attacks

Recently a semi-supervised machine learning systemaddressed the classification of DDoS attacks In this ap-proach CICIDS2017 dataset was used to evaluate systemperformance metrics [37] Although the work addresses therecent DoS vectors the online performance of the methodhas not been evaluated Finally Table 1 summarizes theserecent works whose approach is related to the proposal ofthis article

In Table 1 Online indicates that the proposed system hasbeen tested in online experiments and dataset informs thedataset used for validation while LH DoS indicates whetherit detects slow and high DDoS attacks Sampling indicateswhether some network traffic sampling method is used

Based on the open questions in the literature and recentspecialized reports DoS attacks may remain on the Internetfor some time (e solution to this problem includes theadoption of detection and mitigation strategies that arepractical and economically viable Besides these approachesshould leverage existing provider infrastructure and beimplemented in light of new scientific and technologicaltrends

3 Smart Detection

Smart Detection is designed to combat DDoS attacks on theInternet in a modern collaborative way In this approach thesystem collects network traffic samples and classifies themAttack notification messages are shared using a cloudplatform for convenient use by traffic control protectionsystems (e whole process is illustrated in Figure 1

(e core of the detection system consists of a SignatureDataset (SDS) and a machine learning algorithm (MLA)Figure 2 shows the crucial steps from model build to systemoperation

First normal traffic and DDoS signatures were extractedlabeled and stored in a database SDS was then created usingfeature selection techniques Finally the most accurate MLAwas selected trained and loaded into the traffic classificationsystem

(e architecture of the detection system was designed towork with samples of network traffic provided by industrialstandard traffic sampling protocols collected from networkdevices (e unlabeled samples are received and grouped inflow tables in the receiver buffer (us when the table lengthis greater than or equal to the reference value they arepresented to the classifier responsible for labeling them asshown in Figure 3 If the flow table expires it may beprocessed one more time (e occurrence of small flowtables is higher at lower sampling rates or under some typesof DoS attacks eg SYN flood attacks Table 2 details theparameters for fine tuning of the system

(e complete algorithm of the detection system issummarized in Figure 4 During each cycle of the detectionprocess traffic samples are received and stored in a flow

table For each new flow a unique identifier (FlowID) iscalculated based on the 5-tuple (src_IP dst_IP src_portdst_port and transport_protocol) in steps 1 and 2 If this is anew flow ie there is not any other flow table stored with thesame FlowID the flow table is registered in a sharedmemorybuffer Otherwise if there is a flow table registered with thesame FlowID such as the previously calculated one the dataof the new flow will be merged with the data in the existingflow table in steps 3 and 4 After themerging operation if thetable length is greater than or equal to the reference value(Tl geTmax) the flow table is classified and if it is found to bean attack a notification is emitted Otherwise it is insertedback into the shared memory buffer Meanwhile in step 7the cleanup task looks for expired flow tables in the sharedbuffer ie flow tables that exceed the expiration time of thesystem (EgtET) For each expired flow table the systemchecks the table length If the flow table length is less than orequal to the minimum reference value (Tl leTmin) this flowtable will be processed by step 8 A new FlowID is calculatedusing the 3-tuple (src_IP dst_IP and transport_protocol)as the flow table is routed back to steps 3 and 4

31 Traffic Sampling Smart Detection uses a network trafficsampling technique because processing all the packets in thenetwork can be a computationally expensive task even ifonly the packet headers are parsed In many cases per-forming a deep inspection and analyzing the data area of theapplication layer is unfeasible for detection systems Amongthe protocols adopted by the industry for sampling networktraffic the sFlow protocol is widely used in current devices(e technique used by sFlow is called n-out-of-N samplingIn this technique n samples are selected out of N packetsOne way to achieve a simple random sample is to randomlygenerate n different numbers in the range of 1 to N and thenchoose all packets with a packet position equal to one of the nvalues (is procedure is repeated for every N packetsBesides the sample size is fixed in this approach [38]

(e sFlow monitoring system consists of an agent(embedded in a switch a router or an independent probe)and a collector (e architecture used in the monitoringsystem is designed to provide continuous network moni-toring of high-speed switched and routed devices (e agentuses the sampling technology to capture traffic statisticsfrom the monitored device and forward them to a collectorsystem [39]

32 Feature Extraction In supervised classification strate-gies a set of examples is required for training the classifiermodel (is set is commonly defined as the signature da-tabase Each instance of the database has a set of charac-teristics or variables associated with a label or a class In thiswork the goal is to identify characteristics in network trafficthat are able to distinguish the normal network behaviorfrom DoS attacks (e study is focused on the analysis of theheader variables of the network and transport layer packetsof the TCPIP architecture because it allows saving com-putational resources and simplifies the deployment in theISP networks

4 Security and Communication Networks

In IPv4-compliant networks the network and transportlayer protocols are IP TCP and UDP which are specified inRFC 791 [40] RFC 793 [41] and RFC 768 [42] respectivelyTogether such protocols have a total of 25 header variablesHowever widely used network traffic sampling protocolssuch as NetFlow [43] and sFlow [39] use only a portion of

these variables in the sampling process Commonly theseven used variables are the source and destination IP ad-dresses source and destination ports transport layer pro-tocol IP packet size and TCP flags

(e source and destination IP addresses are not veryuseful for identifying the network traffic behavior in theInternet environment which reduces the number of vari-ables available for analysis to five in the most common casesBased on the five variables mostly used by the flow moni-toring protocols 33 variables were derived as described inTable 3 which use statistical measures that express datavariability In the calculation context of the database vari-ables the references to themean median variance (var) and

Network trafficsamples

Traffic controlsystem

Network devicesCloud pubsub

3-protection

Network trafficclassifier

Notification

2-sharing1-detection

Figure 1 Operation scenario overview of the Smart Detection system

Feature and MLAselection

Train

Model

Model buildSystem operation

Network trafficclassifier

Network trafficsample

Trafficsignatures

SDS

Notificationalert

Figure 2 Detection system overview

Flow table

Unlabeledtraffic flow

samples

Labeledflow

NormalAttackReceiver buffer Classifier

FlowID Var1 Var2 VarNxx

v1 v2 vN

Figure 3 Overview of traffic classification scheme

Table 2 Detection system parameters

Parameter DescriptionTmin Minimum flow table lengthTmax Maximum flow table lengthET Flow table expiration time

Security and Communication Networks 5

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

DDoS attacks and reflection DDoS attacks [7ndash9] profitingfrom the computational capability and geographical dis-tribution promoted by the wide variety of devices and itsdiverse mobility patterns typically founded in IoT andmobile IoT scenarios

In addition to the volumetric DDoS attack low-volumeattacks are on the radar of security experts It is a moresneaky attack that uses few invading hosts events are fastsometimes lasting only a few minutes and usually less thanan hour For these reasons security teams do not know thattheir sites are under attack because common tools do notdetect this type of threat [10] Typically low-volume DDoSexploits application layer protocols respects other protocolsdoes not overload links but causes exhaustion of victimresources

11 Problem Statements DDoS detection and mitigationhave been under study in both the scientific community andindustry for several years (e related literature reveals thatseveral studies have undertaken to propose solutions to dealwith this problem in a general way [6 11ndash15] Anothergroup of works dedicated themselves to presenting specificsolutions for high-volume and low-volume DDoS attacks[8 13 16] Furthermore despite the diverse recommen-dations for mitigating DDoS attacks proposed by theComputer Emergency Response Team (CERT) and guide-lines documented through Request for Comments (RFC)these attacks still occur with high frequency

A study carried out years ago [17] revealed that theineffectiveness of detecting and mitigating DDoS attacks isdirectly related to constant configuration errors and wastedtime due to the lack of tools that follow the dynamics of thenetwork without constant human interference (is has ledresearchers to use autonomous solutions that can operate(detect and mitigate) based on the behavior and charac-teristics of the traffic In this sense the adoption of solutionswith techniques based on artificial intelligence mainlymachine learning (ML) has been distinguished by offeringhigh flexibility in the classification process consequentlyimproving the detection of malicious traffic [18 19]

(e industrial sector offers DDoS protection as a servicethrough large structures usually operated by specializedproviders [6] such as Akamai Cloudflare and Arbor Net-works which have large processing capacity and proprietaryfiltering mechanisms But the industry also has problemssuch as fragility in routing traffic usually via Domain NameSystem (DNS) or Border Gateway Protocol (BGP) difficultydetecting slow attacks and privacy issues which drive awaysome customer segments as governments

Finding the balance between academic propositions andthe industrial practice of combating DDoS is a big challenge(e academy invests in techniques such as machine learning(ML) and proposes to apply them in areas such as DDoSdetection in Internet of (ings (IoT) [20 21] sensorswireless sensors [22] cloud computing [23] and software-defined networking (SDN) [18] and work on producingmore realistic datasets [24 25] and more effective means ofresult validation [26 27] On the other hand industry

segments gradually invested in new paradigms in theirsolutions such as network function virtualization (NFV) andSDN [28 29] to apply scientific discoveries and modernizenetwork structures Even so the incidents of DDoS stillhappen daily reinforcing that the problem is not solved

12 Proposal Realizing these issues this article proposesSmart Detection a novel defense mechanism against DDoSattacks (e system architecture was designed to detect bothhigh- and low-volume DDoS attacks (e proposed systemacts as a sensor that can be installed anywhere on thenetwork and classifies online traffic using an MLA-basedstrategy that makes inferences utilizing random trafficsamples collected on network devices via stream protocol(e proposed approach is compatible with the Internetinfrastructure and does not require software or hardwareupgrades Besides user data privacy is guaranteed at allstages of system operation

13 Contributions In summary the significant contribu-tions of Smart Detection are as follows

(i) (e modeling development and validation of thedetection system are done using a customized datasetand other three well-known ones called CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 where thesystem receives online random samples of networktraffic and classifies them as DoS attacks or normal

(ii) (e proposed detection system differs from otherapproaches by early identification of a variety ofvolumetric attacks such as TCP flood UDP floodand HTTP flood as well as stealth attacks such asHTTP slow headers HTTP slow body and HTTPslow read even with a low traffic sampling rate Inaddition Smart Detection is compatible with thecurrent Internet infrastructure and does not requireany software or hardware upgrades on ISPs At thesame time the proposed system uses advancedtechnologies such as ML and NFV

(iii) Unlike existing security service providers theproposed system does not require traffic redirectionor connection intermediation Data privacy isguaranteed at all stages First the system randomlyprocesses only a small part of the network trafficSecond it does not do packet deep inspectionInstead Smart Detection parses only network layerheader data

(iv) (e pattern recognition of normal network trafficand several DoS attack types are addressed As aresult a novel signature database is created which isused by Smart Detection and can be applied to othersystems

(v) An approach to automatic feature selection has beendeveloped using the cross-validation technique formodel searches that meet specific classificationquality criteria(is approach was used to define thesignatures adopted by Smart Detection

2 Security and Communication Networks

2 Related Works and Background

(e research on intrusion detection in computer networks iswidely discussed in the literature Several detection tech-niques and protection strategies have been proposed inrecent years Studies in the literature classify the IDSs assignature-based anomaly-based and hybrid systems (efirst type identifies potential attacks by comparing currentobserved events with its stored signatures (e second onedetects anomalies by identifying significant deviations be-tween the preestablished normal profile and the currentevents In all cases an alert will be generated if any signatureis matched or if a deviation occurs above a set threshold(emain advantage of the signature-based approach is the lowfalse alarm rate However the challenge is to write signaturesthat cover all possible attack variations In contrast theanomaly-based approach has the ability to detect unknownattacks but it requires more computational resources andoften produces more false alarms Hybrid solutions try toexploit the benefits of both techniques [11 30] DoS attacksare a specific type of network intrusion that has drawn theattention of academia as highlights recent surveys onnetwork applications wireless networks cloud computingand big data [8 13 14 31]

Several classification strategies of DDoS attacks havebeen proposed in the literature in the last decade HoweverDDoS flooding attacks have been further studied beingclassified into two categories based on the protocol level thatis targeted [3]

(i) Networktransport-level DDoS flooding attacks suchattacks are mostly launched using Transmission Con-trol Protocol (TCP) User Datagram Protocol (UDP)Internet Control Message Protocol (ICMP) and Do-main Name System (DNS) protocol packets

(ii) Application-level DDoS flooding attacks such at-tacks are focused on disrupting legitimate userservices by exhausting the server resources egsockets central processing unit (CPU) memorydiskdatabase bandwidth and inputoutput (IO)bandwidth Application-level DDoS attacks gener-ally consume less bandwidth and are stealthier innature than volumetric attacks since they are verysimilar to benign traffic

(e biggest challenge in combating DDoS attacks lies inthe early detection and mitigation of attacks as close aspossible to their origin however the implementation of acomprehensive solution that addresses these features has notyet been achieved [3 32]

Some recent work has inspired the development of theSmart Detection system (ese approaches are listed inTable 1 for comparative purposes

A Hypertext Transfer Protocol- (HTTP-) based tech-nique [16] was proposed to detect flood attacks in webservers using data sampling (e authors used the CUM-SUM algorithm to determine whether the analyzed traffic isnormal or a DoS attack by focusing on two features thenumber of application layer requests and the number of

packets with payload size equal to zero (e results showed adetection rate between 80 and 88 using a sampling rate of20 Although it has made important advances the pro-posed method does not seem applicable in automatic mit-igation systems especially in production environments thatdo not support high sampling rates

D-FACE is a collaborative defense system [34] that uses ageneralized entropy (GE) and generalized informationdistance (GID) metrics in detecting different types of DDoSattacks and flash events (FEs) In this context an FE issimilar to a volumetric DDoS wherein thousands of legiti-mate users try to access a particular computing resourcesuch as a website simultaneously (e results show thatD-FACE can detect DDoS attacks and FEs Although thework presents relevant contributions the validation usedobsolete datasets In addition the proposed collaborationapproach requires a high degree of ISP engagement so itrestricts the industrial use of the solution

Antidose system [33] presents a means of interactionbetween a vulnerable peripheral service and an indirectlyrelated Autonomous System (AS) which allows the AS toconfidently deploy local filtering rules under the control ofthe remote service (e system was evaluated using Mininetbut no benchmark dataset was used (e approach proposedby the authors faces strong resistance from ISPs for tworeasons the first is software and hardware update re-quirement and the second is having no control over localtraffic control policies

(e SkyShield system [35] has been proposed to detect andmitigate DDoS attacks on the application layer In the de-tection phase SkyShield exploits the divergence between twohash tables (Sketches) to detect anomalies caused by attackerhosts (e mitigation phase uses filtering whitelistingblacklisting and CAPTCHA as protection mechanisms (esystemwas evaluated using customized datasets SkyShield hasfocused on the application layer more specifically on theHTTP protocol so the proposed system is vulnerable toflooding at the network layer and transport layer

Umbrella [36] develops a multilayered defense archi-tecture to defend against a wide spectrum of DDoS attacks(e authors proposed an approach based on detection andprotection exclusively on the victimrsquos side (e system was

Table 1 Recent related works

References Dataset Online LHDoS Sampling

[16] CIC-DoS 7 [33] None 7 7

[34] MIT Lincoln FIFA98DDoSTB CAIDA 7

[35] Customized (developedby the authors) 7

[36] Customized (developedby the authors) 7

[37] CICIDS2017 7 7

(eproposedapproach

CIC-DoS CICIDS2017CSE-CIC-IDS2018

Customized

Security and Communication Networks 3

evaluated using the customized testbed in terms of trafficcontrol (e authors claim that the system is capable ofdealing with mass attacks However this approach is widelyused in industry and proved to be inefficient against trulymassive DDoS attacks

Recently a semi-supervised machine learning systemaddressed the classification of DDoS attacks In this ap-proach CICIDS2017 dataset was used to evaluate systemperformance metrics [37] Although the work addresses therecent DoS vectors the online performance of the methodhas not been evaluated Finally Table 1 summarizes theserecent works whose approach is related to the proposal ofthis article

In Table 1 Online indicates that the proposed system hasbeen tested in online experiments and dataset informs thedataset used for validation while LH DoS indicates whetherit detects slow and high DDoS attacks Sampling indicateswhether some network traffic sampling method is used

Based on the open questions in the literature and recentspecialized reports DoS attacks may remain on the Internetfor some time (e solution to this problem includes theadoption of detection and mitigation strategies that arepractical and economically viable Besides these approachesshould leverage existing provider infrastructure and beimplemented in light of new scientific and technologicaltrends

3 Smart Detection

Smart Detection is designed to combat DDoS attacks on theInternet in a modern collaborative way In this approach thesystem collects network traffic samples and classifies themAttack notification messages are shared using a cloudplatform for convenient use by traffic control protectionsystems (e whole process is illustrated in Figure 1

(e core of the detection system consists of a SignatureDataset (SDS) and a machine learning algorithm (MLA)Figure 2 shows the crucial steps from model build to systemoperation

First normal traffic and DDoS signatures were extractedlabeled and stored in a database SDS was then created usingfeature selection techniques Finally the most accurate MLAwas selected trained and loaded into the traffic classificationsystem

(e architecture of the detection system was designed towork with samples of network traffic provided by industrialstandard traffic sampling protocols collected from networkdevices (e unlabeled samples are received and grouped inflow tables in the receiver buffer (us when the table lengthis greater than or equal to the reference value they arepresented to the classifier responsible for labeling them asshown in Figure 3 If the flow table expires it may beprocessed one more time (e occurrence of small flowtables is higher at lower sampling rates or under some typesof DoS attacks eg SYN flood attacks Table 2 details theparameters for fine tuning of the system

(e complete algorithm of the detection system issummarized in Figure 4 During each cycle of the detectionprocess traffic samples are received and stored in a flow

table For each new flow a unique identifier (FlowID) iscalculated based on the 5-tuple (src_IP dst_IP src_portdst_port and transport_protocol) in steps 1 and 2 If this is anew flow ie there is not any other flow table stored with thesame FlowID the flow table is registered in a sharedmemorybuffer Otherwise if there is a flow table registered with thesame FlowID such as the previously calculated one the dataof the new flow will be merged with the data in the existingflow table in steps 3 and 4 After themerging operation if thetable length is greater than or equal to the reference value(Tl geTmax) the flow table is classified and if it is found to bean attack a notification is emitted Otherwise it is insertedback into the shared memory buffer Meanwhile in step 7the cleanup task looks for expired flow tables in the sharedbuffer ie flow tables that exceed the expiration time of thesystem (EgtET) For each expired flow table the systemchecks the table length If the flow table length is less than orequal to the minimum reference value (Tl leTmin) this flowtable will be processed by step 8 A new FlowID is calculatedusing the 3-tuple (src_IP dst_IP and transport_protocol)as the flow table is routed back to steps 3 and 4

31 Traffic Sampling Smart Detection uses a network trafficsampling technique because processing all the packets in thenetwork can be a computationally expensive task even ifonly the packet headers are parsed In many cases per-forming a deep inspection and analyzing the data area of theapplication layer is unfeasible for detection systems Amongthe protocols adopted by the industry for sampling networktraffic the sFlow protocol is widely used in current devices(e technique used by sFlow is called n-out-of-N samplingIn this technique n samples are selected out of N packetsOne way to achieve a simple random sample is to randomlygenerate n different numbers in the range of 1 to N and thenchoose all packets with a packet position equal to one of the nvalues (is procedure is repeated for every N packetsBesides the sample size is fixed in this approach [38]

(e sFlow monitoring system consists of an agent(embedded in a switch a router or an independent probe)and a collector (e architecture used in the monitoringsystem is designed to provide continuous network moni-toring of high-speed switched and routed devices (e agentuses the sampling technology to capture traffic statisticsfrom the monitored device and forward them to a collectorsystem [39]

32 Feature Extraction In supervised classification strate-gies a set of examples is required for training the classifiermodel (is set is commonly defined as the signature da-tabase Each instance of the database has a set of charac-teristics or variables associated with a label or a class In thiswork the goal is to identify characteristics in network trafficthat are able to distinguish the normal network behaviorfrom DoS attacks (e study is focused on the analysis of theheader variables of the network and transport layer packetsof the TCPIP architecture because it allows saving com-putational resources and simplifies the deployment in theISP networks

4 Security and Communication Networks

In IPv4-compliant networks the network and transportlayer protocols are IP TCP and UDP which are specified inRFC 791 [40] RFC 793 [41] and RFC 768 [42] respectivelyTogether such protocols have a total of 25 header variablesHowever widely used network traffic sampling protocolssuch as NetFlow [43] and sFlow [39] use only a portion of

these variables in the sampling process Commonly theseven used variables are the source and destination IP ad-dresses source and destination ports transport layer pro-tocol IP packet size and TCP flags

(e source and destination IP addresses are not veryuseful for identifying the network traffic behavior in theInternet environment which reduces the number of vari-ables available for analysis to five in the most common casesBased on the five variables mostly used by the flow moni-toring protocols 33 variables were derived as described inTable 3 which use statistical measures that express datavariability In the calculation context of the database vari-ables the references to themean median variance (var) and

Network trafficsamples

Traffic controlsystem

Network devicesCloud pubsub

3-protection

Network trafficclassifier

Notification

2-sharing1-detection

Figure 1 Operation scenario overview of the Smart Detection system

Feature and MLAselection

Train

Model

Model buildSystem operation

Network trafficclassifier

Network trafficsample

Trafficsignatures

SDS

Notificationalert

Figure 2 Detection system overview

Flow table

Unlabeledtraffic flow

samples

Labeledflow

NormalAttackReceiver buffer Classifier

FlowID Var1 Var2 VarNxx

v1 v2 vN

Figure 3 Overview of traffic classification scheme

Table 2 Detection system parameters

Parameter DescriptionTmin Minimum flow table lengthTmax Maximum flow table lengthET Flow table expiration time

Security and Communication Networks 5

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

2 Related Works and Background

(e research on intrusion detection in computer networks iswidely discussed in the literature Several detection tech-niques and protection strategies have been proposed inrecent years Studies in the literature classify the IDSs assignature-based anomaly-based and hybrid systems (efirst type identifies potential attacks by comparing currentobserved events with its stored signatures (e second onedetects anomalies by identifying significant deviations be-tween the preestablished normal profile and the currentevents In all cases an alert will be generated if any signatureis matched or if a deviation occurs above a set threshold(emain advantage of the signature-based approach is the lowfalse alarm rate However the challenge is to write signaturesthat cover all possible attack variations In contrast theanomaly-based approach has the ability to detect unknownattacks but it requires more computational resources andoften produces more false alarms Hybrid solutions try toexploit the benefits of both techniques [11 30] DoS attacksare a specific type of network intrusion that has drawn theattention of academia as highlights recent surveys onnetwork applications wireless networks cloud computingand big data [8 13 14 31]

Several classification strategies of DDoS attacks havebeen proposed in the literature in the last decade HoweverDDoS flooding attacks have been further studied beingclassified into two categories based on the protocol level thatis targeted [3]

(i) Networktransport-level DDoS flooding attacks suchattacks are mostly launched using Transmission Con-trol Protocol (TCP) User Datagram Protocol (UDP)Internet Control Message Protocol (ICMP) and Do-main Name System (DNS) protocol packets

(ii) Application-level DDoS flooding attacks such at-tacks are focused on disrupting legitimate userservices by exhausting the server resources egsockets central processing unit (CPU) memorydiskdatabase bandwidth and inputoutput (IO)bandwidth Application-level DDoS attacks gener-ally consume less bandwidth and are stealthier innature than volumetric attacks since they are verysimilar to benign traffic

(e biggest challenge in combating DDoS attacks lies inthe early detection and mitigation of attacks as close aspossible to their origin however the implementation of acomprehensive solution that addresses these features has notyet been achieved [3 32]

Some recent work has inspired the development of theSmart Detection system (ese approaches are listed inTable 1 for comparative purposes

A Hypertext Transfer Protocol- (HTTP-) based tech-nique [16] was proposed to detect flood attacks in webservers using data sampling (e authors used the CUM-SUM algorithm to determine whether the analyzed traffic isnormal or a DoS attack by focusing on two features thenumber of application layer requests and the number of

packets with payload size equal to zero (e results showed adetection rate between 80 and 88 using a sampling rate of20 Although it has made important advances the pro-posed method does not seem applicable in automatic mit-igation systems especially in production environments thatdo not support high sampling rates

D-FACE is a collaborative defense system [34] that uses ageneralized entropy (GE) and generalized informationdistance (GID) metrics in detecting different types of DDoSattacks and flash events (FEs) In this context an FE issimilar to a volumetric DDoS wherein thousands of legiti-mate users try to access a particular computing resourcesuch as a website simultaneously (e results show thatD-FACE can detect DDoS attacks and FEs Although thework presents relevant contributions the validation usedobsolete datasets In addition the proposed collaborationapproach requires a high degree of ISP engagement so itrestricts the industrial use of the solution

Antidose system [33] presents a means of interactionbetween a vulnerable peripheral service and an indirectlyrelated Autonomous System (AS) which allows the AS toconfidently deploy local filtering rules under the control ofthe remote service (e system was evaluated using Mininetbut no benchmark dataset was used (e approach proposedby the authors faces strong resistance from ISPs for tworeasons the first is software and hardware update re-quirement and the second is having no control over localtraffic control policies

(e SkyShield system [35] has been proposed to detect andmitigate DDoS attacks on the application layer In the de-tection phase SkyShield exploits the divergence between twohash tables (Sketches) to detect anomalies caused by attackerhosts (e mitigation phase uses filtering whitelistingblacklisting and CAPTCHA as protection mechanisms (esystemwas evaluated using customized datasets SkyShield hasfocused on the application layer more specifically on theHTTP protocol so the proposed system is vulnerable toflooding at the network layer and transport layer

Umbrella [36] develops a multilayered defense archi-tecture to defend against a wide spectrum of DDoS attacks(e authors proposed an approach based on detection andprotection exclusively on the victimrsquos side (e system was

Table 1 Recent related works

References Dataset Online LHDoS Sampling

[16] CIC-DoS 7 [33] None 7 7

[34] MIT Lincoln FIFA98DDoSTB CAIDA 7

[35] Customized (developedby the authors) 7

[36] Customized (developedby the authors) 7

[37] CICIDS2017 7 7

(eproposedapproach

CIC-DoS CICIDS2017CSE-CIC-IDS2018

Customized

Security and Communication Networks 3

evaluated using the customized testbed in terms of trafficcontrol (e authors claim that the system is capable ofdealing with mass attacks However this approach is widelyused in industry and proved to be inefficient against trulymassive DDoS attacks

Recently a semi-supervised machine learning systemaddressed the classification of DDoS attacks In this ap-proach CICIDS2017 dataset was used to evaluate systemperformance metrics [37] Although the work addresses therecent DoS vectors the online performance of the methodhas not been evaluated Finally Table 1 summarizes theserecent works whose approach is related to the proposal ofthis article

In Table 1 Online indicates that the proposed system hasbeen tested in online experiments and dataset informs thedataset used for validation while LH DoS indicates whetherit detects slow and high DDoS attacks Sampling indicateswhether some network traffic sampling method is used

Based on the open questions in the literature and recentspecialized reports DoS attacks may remain on the Internetfor some time (e solution to this problem includes theadoption of detection and mitigation strategies that arepractical and economically viable Besides these approachesshould leverage existing provider infrastructure and beimplemented in light of new scientific and technologicaltrends

3 Smart Detection

Smart Detection is designed to combat DDoS attacks on theInternet in a modern collaborative way In this approach thesystem collects network traffic samples and classifies themAttack notification messages are shared using a cloudplatform for convenient use by traffic control protectionsystems (e whole process is illustrated in Figure 1

(e core of the detection system consists of a SignatureDataset (SDS) and a machine learning algorithm (MLA)Figure 2 shows the crucial steps from model build to systemoperation

First normal traffic and DDoS signatures were extractedlabeled and stored in a database SDS was then created usingfeature selection techniques Finally the most accurate MLAwas selected trained and loaded into the traffic classificationsystem

(e architecture of the detection system was designed towork with samples of network traffic provided by industrialstandard traffic sampling protocols collected from networkdevices (e unlabeled samples are received and grouped inflow tables in the receiver buffer (us when the table lengthis greater than or equal to the reference value they arepresented to the classifier responsible for labeling them asshown in Figure 3 If the flow table expires it may beprocessed one more time (e occurrence of small flowtables is higher at lower sampling rates or under some typesof DoS attacks eg SYN flood attacks Table 2 details theparameters for fine tuning of the system

(e complete algorithm of the detection system issummarized in Figure 4 During each cycle of the detectionprocess traffic samples are received and stored in a flow

table For each new flow a unique identifier (FlowID) iscalculated based on the 5-tuple (src_IP dst_IP src_portdst_port and transport_protocol) in steps 1 and 2 If this is anew flow ie there is not any other flow table stored with thesame FlowID the flow table is registered in a sharedmemorybuffer Otherwise if there is a flow table registered with thesame FlowID such as the previously calculated one the dataof the new flow will be merged with the data in the existingflow table in steps 3 and 4 After themerging operation if thetable length is greater than or equal to the reference value(Tl geTmax) the flow table is classified and if it is found to bean attack a notification is emitted Otherwise it is insertedback into the shared memory buffer Meanwhile in step 7the cleanup task looks for expired flow tables in the sharedbuffer ie flow tables that exceed the expiration time of thesystem (EgtET) For each expired flow table the systemchecks the table length If the flow table length is less than orequal to the minimum reference value (Tl leTmin) this flowtable will be processed by step 8 A new FlowID is calculatedusing the 3-tuple (src_IP dst_IP and transport_protocol)as the flow table is routed back to steps 3 and 4

31 Traffic Sampling Smart Detection uses a network trafficsampling technique because processing all the packets in thenetwork can be a computationally expensive task even ifonly the packet headers are parsed In many cases per-forming a deep inspection and analyzing the data area of theapplication layer is unfeasible for detection systems Amongthe protocols adopted by the industry for sampling networktraffic the sFlow protocol is widely used in current devices(e technique used by sFlow is called n-out-of-N samplingIn this technique n samples are selected out of N packetsOne way to achieve a simple random sample is to randomlygenerate n different numbers in the range of 1 to N and thenchoose all packets with a packet position equal to one of the nvalues (is procedure is repeated for every N packetsBesides the sample size is fixed in this approach [38]

(e sFlow monitoring system consists of an agent(embedded in a switch a router or an independent probe)and a collector (e architecture used in the monitoringsystem is designed to provide continuous network moni-toring of high-speed switched and routed devices (e agentuses the sampling technology to capture traffic statisticsfrom the monitored device and forward them to a collectorsystem [39]

32 Feature Extraction In supervised classification strate-gies a set of examples is required for training the classifiermodel (is set is commonly defined as the signature da-tabase Each instance of the database has a set of charac-teristics or variables associated with a label or a class In thiswork the goal is to identify characteristics in network trafficthat are able to distinguish the normal network behaviorfrom DoS attacks (e study is focused on the analysis of theheader variables of the network and transport layer packetsof the TCPIP architecture because it allows saving com-putational resources and simplifies the deployment in theISP networks

4 Security and Communication Networks

In IPv4-compliant networks the network and transportlayer protocols are IP TCP and UDP which are specified inRFC 791 [40] RFC 793 [41] and RFC 768 [42] respectivelyTogether such protocols have a total of 25 header variablesHowever widely used network traffic sampling protocolssuch as NetFlow [43] and sFlow [39] use only a portion of

these variables in the sampling process Commonly theseven used variables are the source and destination IP ad-dresses source and destination ports transport layer pro-tocol IP packet size and TCP flags

(e source and destination IP addresses are not veryuseful for identifying the network traffic behavior in theInternet environment which reduces the number of vari-ables available for analysis to five in the most common casesBased on the five variables mostly used by the flow moni-toring protocols 33 variables were derived as described inTable 3 which use statistical measures that express datavariability In the calculation context of the database vari-ables the references to themean median variance (var) and

Network trafficsamples

Traffic controlsystem

Network devicesCloud pubsub

3-protection

Network trafficclassifier

Notification

2-sharing1-detection

Figure 1 Operation scenario overview of the Smart Detection system

Feature and MLAselection

Train

Model

Model buildSystem operation

Network trafficclassifier

Network trafficsample

Trafficsignatures

SDS

Notificationalert

Figure 2 Detection system overview

Flow table

Unlabeledtraffic flow

samples

Labeledflow

NormalAttackReceiver buffer Classifier

FlowID Var1 Var2 VarNxx

v1 v2 vN

Figure 3 Overview of traffic classification scheme

Table 2 Detection system parameters

Parameter DescriptionTmin Minimum flow table lengthTmax Maximum flow table lengthET Flow table expiration time

Security and Communication Networks 5

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

evaluated using the customized testbed in terms of trafficcontrol (e authors claim that the system is capable ofdealing with mass attacks However this approach is widelyused in industry and proved to be inefficient against trulymassive DDoS attacks

Recently a semi-supervised machine learning systemaddressed the classification of DDoS attacks In this ap-proach CICIDS2017 dataset was used to evaluate systemperformance metrics [37] Although the work addresses therecent DoS vectors the online performance of the methodhas not been evaluated Finally Table 1 summarizes theserecent works whose approach is related to the proposal ofthis article

In Table 1 Online indicates that the proposed system hasbeen tested in online experiments and dataset informs thedataset used for validation while LH DoS indicates whetherit detects slow and high DDoS attacks Sampling indicateswhether some network traffic sampling method is used

Based on the open questions in the literature and recentspecialized reports DoS attacks may remain on the Internetfor some time (e solution to this problem includes theadoption of detection and mitigation strategies that arepractical and economically viable Besides these approachesshould leverage existing provider infrastructure and beimplemented in light of new scientific and technologicaltrends

3 Smart Detection

Smart Detection is designed to combat DDoS attacks on theInternet in a modern collaborative way In this approach thesystem collects network traffic samples and classifies themAttack notification messages are shared using a cloudplatform for convenient use by traffic control protectionsystems (e whole process is illustrated in Figure 1

(e core of the detection system consists of a SignatureDataset (SDS) and a machine learning algorithm (MLA)Figure 2 shows the crucial steps from model build to systemoperation

First normal traffic and DDoS signatures were extractedlabeled and stored in a database SDS was then created usingfeature selection techniques Finally the most accurate MLAwas selected trained and loaded into the traffic classificationsystem

(e architecture of the detection system was designed towork with samples of network traffic provided by industrialstandard traffic sampling protocols collected from networkdevices (e unlabeled samples are received and grouped inflow tables in the receiver buffer (us when the table lengthis greater than or equal to the reference value they arepresented to the classifier responsible for labeling them asshown in Figure 3 If the flow table expires it may beprocessed one more time (e occurrence of small flowtables is higher at lower sampling rates or under some typesof DoS attacks eg SYN flood attacks Table 2 details theparameters for fine tuning of the system

(e complete algorithm of the detection system issummarized in Figure 4 During each cycle of the detectionprocess traffic samples are received and stored in a flow

table For each new flow a unique identifier (FlowID) iscalculated based on the 5-tuple (src_IP dst_IP src_portdst_port and transport_protocol) in steps 1 and 2 If this is anew flow ie there is not any other flow table stored with thesame FlowID the flow table is registered in a sharedmemorybuffer Otherwise if there is a flow table registered with thesame FlowID such as the previously calculated one the dataof the new flow will be merged with the data in the existingflow table in steps 3 and 4 After themerging operation if thetable length is greater than or equal to the reference value(Tl geTmax) the flow table is classified and if it is found to bean attack a notification is emitted Otherwise it is insertedback into the shared memory buffer Meanwhile in step 7the cleanup task looks for expired flow tables in the sharedbuffer ie flow tables that exceed the expiration time of thesystem (EgtET) For each expired flow table the systemchecks the table length If the flow table length is less than orequal to the minimum reference value (Tl leTmin) this flowtable will be processed by step 8 A new FlowID is calculatedusing the 3-tuple (src_IP dst_IP and transport_protocol)as the flow table is routed back to steps 3 and 4

31 Traffic Sampling Smart Detection uses a network trafficsampling technique because processing all the packets in thenetwork can be a computationally expensive task even ifonly the packet headers are parsed In many cases per-forming a deep inspection and analyzing the data area of theapplication layer is unfeasible for detection systems Amongthe protocols adopted by the industry for sampling networktraffic the sFlow protocol is widely used in current devices(e technique used by sFlow is called n-out-of-N samplingIn this technique n samples are selected out of N packetsOne way to achieve a simple random sample is to randomlygenerate n different numbers in the range of 1 to N and thenchoose all packets with a packet position equal to one of the nvalues (is procedure is repeated for every N packetsBesides the sample size is fixed in this approach [38]

(e sFlow monitoring system consists of an agent(embedded in a switch a router or an independent probe)and a collector (e architecture used in the monitoringsystem is designed to provide continuous network moni-toring of high-speed switched and routed devices (e agentuses the sampling technology to capture traffic statisticsfrom the monitored device and forward them to a collectorsystem [39]

32 Feature Extraction In supervised classification strate-gies a set of examples is required for training the classifiermodel (is set is commonly defined as the signature da-tabase Each instance of the database has a set of charac-teristics or variables associated with a label or a class In thiswork the goal is to identify characteristics in network trafficthat are able to distinguish the normal network behaviorfrom DoS attacks (e study is focused on the analysis of theheader variables of the network and transport layer packetsof the TCPIP architecture because it allows saving com-putational resources and simplifies the deployment in theISP networks

4 Security and Communication Networks

In IPv4-compliant networks the network and transportlayer protocols are IP TCP and UDP which are specified inRFC 791 [40] RFC 793 [41] and RFC 768 [42] respectivelyTogether such protocols have a total of 25 header variablesHowever widely used network traffic sampling protocolssuch as NetFlow [43] and sFlow [39] use only a portion of

these variables in the sampling process Commonly theseven used variables are the source and destination IP ad-dresses source and destination ports transport layer pro-tocol IP packet size and TCP flags

(e source and destination IP addresses are not veryuseful for identifying the network traffic behavior in theInternet environment which reduces the number of vari-ables available for analysis to five in the most common casesBased on the five variables mostly used by the flow moni-toring protocols 33 variables were derived as described inTable 3 which use statistical measures that express datavariability In the calculation context of the database vari-ables the references to themean median variance (var) and

Network trafficsamples

Traffic controlsystem

Network devicesCloud pubsub

3-protection

Network trafficclassifier

Notification

2-sharing1-detection

Figure 1 Operation scenario overview of the Smart Detection system

Feature and MLAselection

Train

Model

Model buildSystem operation

Network trafficclassifier

Network trafficsample

Trafficsignatures

SDS

Notificationalert

Figure 2 Detection system overview

Flow table

Unlabeledtraffic flow

samples

Labeledflow

NormalAttackReceiver buffer Classifier

FlowID Var1 Var2 VarNxx

v1 v2 vN

Figure 3 Overview of traffic classification scheme

Table 2 Detection system parameters

Parameter DescriptionTmin Minimum flow table lengthTmax Maximum flow table lengthET Flow table expiration time

Security and Communication Networks 5

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

In IPv4-compliant networks the network and transportlayer protocols are IP TCP and UDP which are specified inRFC 791 [40] RFC 793 [41] and RFC 768 [42] respectivelyTogether such protocols have a total of 25 header variablesHowever widely used network traffic sampling protocolssuch as NetFlow [43] and sFlow [39] use only a portion of

these variables in the sampling process Commonly theseven used variables are the source and destination IP ad-dresses source and destination ports transport layer pro-tocol IP packet size and TCP flags

(e source and destination IP addresses are not veryuseful for identifying the network traffic behavior in theInternet environment which reduces the number of vari-ables available for analysis to five in the most common casesBased on the five variables mostly used by the flow moni-toring protocols 33 variables were derived as described inTable 3 which use statistical measures that express datavariability In the calculation context of the database vari-ables the references to themean median variance (var) and

Network trafficsamples

Traffic controlsystem

Network devicesCloud pubsub

3-protection

Network trafficclassifier

Notification

2-sharing1-detection

Figure 1 Operation scenario overview of the Smart Detection system

Feature and MLAselection

Train

Model

Model buildSystem operation

Network trafficclassifier

Network trafficsample

Trafficsignatures

SDS

Notificationalert

Figure 2 Detection system overview

Flow table

Unlabeledtraffic flow

samples

Labeledflow

NormalAttackReceiver buffer Classifier

FlowID Var1 Var2 VarNxx

v1 v2 vN

Figure 3 Overview of traffic classification scheme

Table 2 Detection system parameters

Parameter DescriptionTmin Minimum flow table lengthTmax Maximum flow table lengthET Flow table expiration time

Security and Communication Networks 5

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

standard deviation (std) should be interpreted as samplemeasures

(e variable named protocol is a simple normalization ofthe protocol field extracted from the transport layer packetheaders in the form

ip proto Nproto

K (1)

where Nproto is the code of the protocol and K is an nor-malization constant set to the value 1000 For instanceNproto 6 and Nproto 17 in TCP and UDP protocolsrespectively

With the main four variables mostly used in flowmonitoring it is possible to calculate the following associ-ated statistic measurements

(i) Entropy the entropy of the variable is calculated by

Entropy(X) minus 1113944i

p Xi( 1113857log2 p Xi( 1113857 (2)

where X is the variable of interest eg the sourceport

(ii) Coefficient of variation the coefficient of variation iscalculated by

cv(X) std(X)

mean(X) (3)

where std(X) is the estimated standard deviationand mean(X) is the estimated average of thevariable

(iii) Quantile coefficient this parameter is definedherein by

cvq(X) 1113954QX(1 minus p) minus 1113954QX(p)

1113954QX(1 minus p) + 1113954QX(p) (4)

where 1113954QX(p) is the sample p-quantile p isin (0 05)expressed by [44]

1113954QX(p) X(k) + X(k+1) minus X(k)1113872 1113873lowastf (5)

Start

(1) Receive samples

(2) Create5-tuple flow table

(4) Merge New flow (3) Store

Tl ge Tmax Buffer

(5) Classify (7) Cleanup

E gt ETand

Tl ge Tmin

(8) Create3-tuple flow table

Is it attack

(6) Notify

Finish

NoYes

No

Yes

Yes

No

Yes

No

Figure 4 Detection system algorithm

Table 3 Extracted variables

Variable Detail01 ip_proto Normalized protocol number02 ip_len_mean Mean of IP length03 ip_len_median Median of IP length04 ip_len_var Variance of IP length05 ip_len_std Stand deviation of IP length06 ip_len_entropy Entropy of IP length07 ip_len_cv Coeff of variation of IP length08 ip_len_cvq Quantile coeff of IP length09 ip_len_rte Rate change of IP length10 sport_mean Mean of src port11 sport_median Median of src port12 sport_var Variance of src port13 sport_std Stand deviation of src port14 sport_entropy Entropy of src port15 sport_cv Coeff of variation of src port16 sport_cvq Quantile coeff of src port17 sport_rte Rate change of src port18 dport_mean Mean of dest port19 dport_median Median of dest port20 dport_var Variance of dest port21 dport_std Stand deviation of dest port22 dport_entropy Entropy of dest port23 dport_cv Coeff of variation of dest port24 dport_cvq Quantile coeff of dest port25 dport_rte Rate change of dest port26 tcp_flags_mean Mean of TCP flags27 tcp_flags_median Median of TCP flags28 tcp_flags_var Variance of TCP flags29 tcp_flags_std Stand deviation of TCP flags30 tcp_flags_entropy Entropy of TCP flags31 tcp_flags_cv Coeff of variation of TCP flags32 tcp_flags_cvq Quantile coeff of TCP flags33 tcp_flags_rte Rate change of TCP flags

6 Security and Communication Networks

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

with being X(1) X(n)1113966 1113967 the order statistics ofindependent observations X1 Xn1113864 1113865k lfloorplowast nrfloor and f is the fractional part of the indexsurrounded by X(k) and X(k+1)

(iv) Rate of change this metric is given by

rte(X) UX

SX

(6)

(v) where UX is the amount of unique values and SX isthe overall number of X values

(e data traffic with normal activity behavior wasextracted from the ISCXIDS2012 dataset [45] (e datatraffic with DoS behavior was obtained in a laboratorycontrolled environment using tools such as hping3 [46]hulk [47] Goldeneye [48] and slowhttptest [49]

Processes such as extracting transforming and labelingthe database instances are summarized in Figure 5 (e rawnetwork traffic was extracted from the capture files as thepackets were then grouped into sessions For each sessionone instance of the descriptor database containing all var-iables listed in Table 3 was calculated In this study only thesessions with five hundred packets or higher were consid-ered to better represent each network traffic type

(e final database contains examples of normal traffic(23088 instances) TCP flood attacks (14988 instances)UDP flood (6894 instances) HTTP flood (347 instances)and HTTP slow (183 instances)

33 Feature and MLA Selection Feature selection is animportant step in the pattern recognition process andconsists of defining the smallest possible set of variablescapable of efficiently describing a set of classes [50] Severaltechniques for variable selection are available in the litera-ture and implemented in software libraries as scikit-learn[51] In this work the selection of variables was performed intwo stages First Recursive Feature Elimination with Cross-Validation (RFECV) was used with some machine learningalgorithms widely used in the scientific literature ierandom Forest (RF) logistic regression (LR) AdaBooststochastic gradient descent (SGD) decision tree (DTree)and perceptron RF obtained higher precision using 28variables while AdaBoost selected seven variables but ob-tained lower accuracy as shown in Table 4 In the secondstage a new feature selection test was performed with RFusing proposed Algorithm 1

In the proposed feature selection approach using RF thenumber of variables was reduced from 28 to 20 with a smallincrease in accuracy as shown in Table 5 (e proposedalgorithm was executed using the following input param-eters 1000 rounds 99 of variable importance 95 ofglobal accuracy and 85 of per-class accuracy Figure 6shows that most models tested used 20 variables Howevereach model used specific sets of variables In order to choosethe most relevant variables from the selected models the RFvariable importance criterion was used as described in line

25 of Algorithm 1 (e final result of feature selection isshown in Figure 7

(e results show that RF obtained higher accuracy thanthe other algorithms Although it uses more variables thanSGD and AdaBoost a low false alarm rate is a prime re-quirement in DDoS detection systems In this case RFproved to be the best algorithm option for the Smart De-tection system Random forest is a supervised learning al-gorithm that builds a large number of random decision treesand merges them together to make predictions Each tree istrained with a random subset of the total set of labeledsamples In the classification process the most voted classamong all the trees in the model indicates the result of theclassifier [52] In the proposed detection system algorithmshown in Figure 4 RF is used to classify online networktraffic a task that requires computational efficiency and highhit rates

4 Results

(enetwork traffic was classified by the detection system in acontrolled network environment using different samplingrates In the experiments raw network traffic of the CIC-DoS[16] CICIDS2017 [25] and CSE-CIC-IDS2018 [25] datasetsand the raw network traffic captured in the customizedtestbed experiments were employed (e Smart Detectionsystem has reached high accuracy and low false-positive rateExperiments were conducted using two Virtual Linux boxes

Normal trafficcapture Attack traffic capture

Extract packets andgroup by session

Calculate variablesand label instances

Signaturedatabase

Figure 5 Process of network traffic extraction transformation andlabeling

Table 4 10-fold RFECV results

MLA No of features Accuracy1 RF 28 09960102 DTree 25 09941823 LR 26 09723274 SGD 16 09694745 Perceptron 28 09372566 AdaBoost 7 0931131

Security and Communication Networks 7

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

Input database descriptors variable importance threshold accuracy threshold and number of roundsOutput selected variables

(1) begin(2) Create empty optimized model set(3) for i⟵ 1 to Number of rounds do(4) Define all the descriptor database variables as the current variables(5) while True do(6) Split dataset in training and test partitions(7) Create and train the model using training data partition(8) Select the most important variables from the trained model(9) Calculate the cumulative importance of variables from the trained model(10) if max (cumulative importance of variables)ltVariable importance threshold then(11) Exit loop(12) end(13) Train the model using only the most important variables(14) Test the trained model and calculate the accuracy(15) if Calculated accuracyltAccuracy threshold then(16) Exit loop(17) end(18) Add current model to optimized model set(19) Define the most important variables from the trained model as the current variables(20) end(21) end(22) Group the models by number of variables(23) Remove outliers from the grouped model set(24) Select the group of models with the highest frequency and their number of variables ldquoNrdquo(25) Rank the variables by the mean of the importance calculated in step 7(26) Return the ldquoNrdquo most important variables(27) end

ALGORITHM 1 Feature selection

Number of models

Num

ber o

f var

iabl

es

0 500 1000 2000 30001500 2500 3500

3456789

1011

1314151617181920212223

12

46594

21003680

32682538

24322595

27723159

26762431

22601791

1472959

606265

726

1972

Figure 6 Number of variables versus number of models

8 Security and Communication Networks

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

each one of them using 8 virtual CPUs (vCPUs) with 8GBRAM

41 Description of the Benchmark Datasets Many differentdatasets such as DARPA (Lincoln Laboratory 1998-99)KDDprime99 (University of California Irvine 1998-99) andLBNL (Lawrence Berkeley National Laboratory and ICSI2004-2005) have been used by the researchers to evaluate theperformance of their proposed intrusion detection andprevention approaches However many such datasets areout of date and unreliable to use [25] In this study thedatasets CIC-DoS CICIDS2017 and CSE-CIC-IDS2018 andcustomized dataset were used since they include modernthreats and DoS techniques

411 gte ISCXIDS2012 Dataset (e Information SecurityCenter of Excellence (ISCX) IDS 2012 dataset (ISC-XIDS2012) was built in at the University of New Brunswickto provide a contemporary benchmark (e dataset tracedreal packets for seven days of network activity includingHTTP SMTP SSH IMAP POP3 and FTP protocolscovering various scenarios of normal and malicious activ-ities ISCXIDS2012 consists of labeled network traces in-cluding full packet payloads in pcap format and is publiclyavailable (httpswwwunbcacicdatasetsidshtml) [45](is work focuses on the normal activities of the ISC-XIDS2012 pcap file for extraction of signatures more spe-cifically data file of Friday 1162010

412 gte CIC-DoS Dataset (e CIC-DoS dataset focuseson application layer DoS attacks mixed with the ISC-XIDS2012 dataset attack-free traces Four kinds of attackswere produced with different tools yielding 8 different DoSattack strokes from the application layer [16] (e resultingset contains 24 hours of network traffic with a total size of

46GB and is publicly available (httpswwwunbcacicdatasetsdos-datasethtml) A summary of the attack eventsand tools used in the CIC-DoS is presented in Table 6

In the execution of the low-volume attacks using toolslowhttptest [49] the default value of 50 connections perattack was adopted thus making the attacks more sneakyaccording to [16]

413 gte CICIDS2017 Dataset CICIDS2017 dataset wasrecently developed by ISCX and contains benign traffic andthe most up-to-date common attacks (is new IDS datasetincludes seven common updated family of attacks that metthe real-world criteria and is publicly available (httpwwwunbcacicdatasetsIDS2017html) [25]

(is work focuses on the malicious DoS activities of theWednesday July 5 2017 capture file which consist of fiveDoSDDoS attacks and a wide variety of normal networktraffic (e resulting set contains 8h of network traffic with atotal size of 13G (e attack tools used include slowlorisSlowhttptest Hulk GoldenEye and Heartbleed

414 gte CSE-CIC-IDS2018 Dataset (is dataset is a col-laborative project between the Communications SecurityEstablishment (CSE) and the Canadian Institute forCybersecurity (CIC) (e final dataset includes seven dif-ferent attack scenarios brute force Heartbleed Botnet DoSDDoS web attacks and infiltration of the network frominside (e attacking infrastructure includes 50 machinesand the victim organization has 5 departments and includes420machines and 30 servers A research document outliningthe dataset analysis details and similar related principles hasbeen published by [25] All data are publicly available(httpswwwunbcacicdatasetsids-2018html)

(is work focuses on the malicious DoSDDoS activitiesof Friday February 16 2018 and Tuesday February 20

tcp_flags_rte

tcp_flags_meantcp_flags_median

tcp_flags_stdtcp_flags_vartcp_flags_cv

ip_len_stdip_len_cv

ip_protoip_len_entropy

sport_entropytcp_flags_entropy

sport_cvsport_rte

sport_cvqsport_std

sport_mean

ip_len_rte

ip_len_meanip_len_var

Varia

bles

00690048

002500240024

00160013

001000900090009

00080005

000200020002

00000000

000 001 002 003 004 005 006 007Importance

Figure 7 Selected variables by the proposed feature selection algorithm

Security and Communication Networks 9

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

2018 capture files (e attack tools used include Slow-HTTPTest Slowhttptest Hulk LOIC and HOIC

415 gte Customized Dataset (e customized dataset wasdeveloped in a controlled network environment as shown inFigure 8 VLANs 10 20 30 and 40 are used as victim hostsVLAN 165 is dedicated to the users of an academic unitVLAN 60 is used as an attacking host while monitoringoccurs in VLAN 1 All networks have regular access to theInternet

(e attack plan was configured so that one attack isgenerated every 30 minutes in a total of 48 attack events in24 hours starting at 00 h00m00 s and ending at23 h59m00 s All attacks were executed by attacker host1721660100 during which it did not transmit legitimatetraffic to the victims (e attack tools were parameterized toproduce sneaky low-volume medium-volume or lightmode and massive high-volume attacks Ten variations ofprotocol-based and application-based attacks were adoptedusing four attack tools as shown in Table 7 (e duration ofprotocol-based and high-volume application-based attackswas 30 seconds while low-volume application-based attacksranged from 30 to 240 seconds

In the accomplishment of low-volume attacks using toolslowhttptest [49] the number of connection parameters wasadopted as 1500 instead of the default option corre-sponding to 50 connections

42 Online Experiments Online experiments were per-formed in a controlled laboratory environment according tothe following validation methodology

(1) Raw data of the network traffic are obtained foranalysis in pcap file format

(2) (e attack plan indicating the origin destinationattack type and respective duration for the trafficindicated in step 1 is drawn

(3) (e environment for reprocessing and classifying thetraffic is configured

(4) (e traffic is processed and classified(5) (e system performance is properly evaluated by

comparing the output from step 4 with the attackplan described in step 2

Following this validation methodology the sources oftraffic capture were used CIC-DoS CICIDS2017 CSE-CIC-IDS2018 and customized dataset thus fulfilling steps 1 and 2

(e environment for traffic reprocessing and classification asdescribed in step 3 was configured using two Virtual Linuxboxes running Open Virtual Switch (OVS) [28] TcpReplaysoftware [53] and the Smart Detection system as shown inFigure 9

In the reprocessing classification and evaluation of thetraffic during steps 4 and 5 the raw data traffic was replayedby TcpReplay software in a specific OVS port and sampled bythe sFlow agent for OVS (e sampled traffic was sent to theSmart Detection system and the classification result wascompared with the attack plan Figure 9 summarizes theprocedures carried out by the proposed validation meth-odology (e raw network traffic file is reprocessed on VM-01 and the sFlow agent collects traffic samples and sendsthem to Smart Detection on VM-02

421 System Setup (e smart Detection system has threemain parameters that directly influence its performance(ese parameters shown in Table 1 allow the user to calibratethe detection system according to the operating environ-ment In scenarios where the SR is too low and the Tmax istoo large for example traffic samples are discarded beforeprocessing by the classifier On the other hand if Tmax is toosmall the FAR increases because the classifier has few data toanalyze In the case of slow DDoS low SR and large Tmax alsoreduce the attack detection rate due to in-memory flow tableexpiration time (ET 2)

So several experiments to calibrate the system wereperformed using (i) SR of 1 5 10 and 20 (ii) Tmaxparameter of 25 50 and 100 and (iii) ET of 2 5 and 10seconds in the testing environment (e most balancedresult was obtained with SR 10 Tmax 50 and ET 2

422 Evaluation Metrics System performance was evalu-ated using the Precision (PREC) Recall (REC) andF-Measure (F1) metrics present in the literature [54 55]PREC measures the ability to avoid false positive while RECmeasures system sensitivity F1 is a harmonic average be-tween PREC and REC In this context (i) true positive (TP)is the attack traffic predicted correctly (ii) true negative(TN) is normal traffic also predicted correctly (iii) falsepositive (FP) is the normal traffic predicted incorrectly and(iv) false negative (FN) is the attack traffic predicted in-correctly (ese metrics were computed by the followingexpressions

PREC TP

TP + FP

REC TP

TP + FN

F1 2 timesPREC times RECPREC + REC

(7)

Besides the detection rate (DR) and false alarm rate(FAR) metrics were used (e DR is the ratio between thenumber of attacks detected by the system and the actualnumber of attacks performed FAR is the ratio between FP

Table 5 10-fold cross-validation results using the variables selectedby the proposed algorithm

MLA No of features Accuracy1 RF 20 09993632 DTree 20 09990113 Perceptron 20 09960004 SGD 20 09869895 LR 20 09827046 AdaBoost 20 0956512

10 Security and Communication Networks

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

and the sum of FP and TN(ese metrics were computed bythe following expressions

DR AD

TA (8)

where AD is the number of detected attacks and TA is thetotal number of performed attacks

FAR FP

FP + TN (9)

where FP corresponds to the false-positive classificationsand TN is the true-positive classifications

(e DR and FAR calculations assume that only mali-cious traffic was sent from the attacker to the victim at thetime of the attack

43Results andDiscussion (e proposed approach has beenevaluated using the aforementioned datasets system setupand metrics Table 8 summarizes the system performance foreach dataset

As can be observed the best performance was obtainedin the CSE-CIC-IDS2018 dataset with a DR of 100 anFAR of 0000 and a PREC of 100 During the analysisthere was a low occurrence of normal network traffic andwell-defined bursts of malicious traffic(is type of behaviorfacilitates detection by the system and justifies the high hitrates achieved However a slightly lower performance wasobtained in the customized dataset and CIC-DoS datasetwith a DR of 965 and 936 an FAR of 02 and 004and a PREC of 995 and 999 respectively In thosedatasets there is a higher volume of normal traffic andvarious types of attacks including stealth application layerattacks In this more realistic scenario the proposed systempresented some detection failures but still obtained acompetitive performance On the other hand the worstresult was obtained with the CICIDS2017 dataset with 80

Internet

Core router

VLAN 1

Target host

Target host

Target host Target host

Monitoring

Attacker

Production host

VLAN 101721610024

VLAN 201721620024

VLAN 301721630024

VLAN 165

VLAN 401721640024

VLAN 601721660024

Figure 8 Customized testbed topology

Table 6 CIC-DoS dataset events and tools

Attack and tool Eventsddossim (ddossim) 2DoS improved GET (Goldeneye [48]) 3DoS GET (hulk [47]) 4Slow body (slowbody2) 4Slow read (slowread) 2Slow headers (slowheaders) 5Rudy (rudy) 4Slowloris (slowloris) 2

sFlow samples

Network

Smart detection

VM-01 VM-02

Dataset(pcap)

TCP

repl

ay

Ope

n vi

rtua

lsw

itch

Col

lect

or

Proc

esso

r

Clas

sifie

r

Not

ifier

Figure 9 Smart Detection online validation scheme

Table 7 Customized dataset events and tools

Attack and tool EventsTCP SYN flood (hping3 [46]) 4TCP SYN flood-light mode (hping3 [46]) 4TCP ACK flood (hping3 [46]) 4UDP flood (hping3 [46]) 4UDP flood-random dst port (hping3 [46]) 4DoS improved GET (Goldeneye [48]) 5DoS GET (hulk [47]) 4Slow headers (slowhttptest [49]) 5Slow body (slowhttptest [49]) 5Range attack (slowhttptest [49]) 4Slow read (slowhttptest [49]) 5

Security and Communication Networks 11

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

DR 2 FAR and 992 PREC (is dataset expresses amore realistic network scenario which includes normaltraffic mixed with high-volume and low-volume malicioustraffic with sneaky behavior such as slow application layerattacks Even so the proposed system detected 4 out of 5attacks with PREC greater than 90 and FAR less than 1showing that the method is feasible

To discuss online detection and consumption of com-puting resources during experimentation the CICIDS2017dataset was chosen because it is quite realistic recent andsummarizes the major vectors of DoS attacks Even in themost adverse scenario the experiment was completednormally as shown in Figures 10 and 11 Overall networktraffic is demonstrated in Figure 10(a) while Figure 10(b)highlights the sampled traffic sent to the detection system Ascan be seen for network traffic of 813Mbps the detectionsystem receives only 174Mbps making this approachscalable Overall traffic rating is shown in Figure 11(a) whileFigure 11(b) exclusively highlights malicious traffic rating Itcan be said that the system was efficient in distinguishing thenormal traffic from the DoS attacks because of all the attacks

performed only the Heartbleed attack was not detectedhighlighted in Figure 10(b) between 15 h and 16 h (is kindof attack is primarily intended to collect data by exploitingOpenSSL software vulnerabilities as described in CVE-2014-0160 although it can also assume the behavior of a DDoSattack as in any application However in this case thesystem raised a false negative (e most obvious reasons forthis FN are (i) the execution of the Heartbleed attack withoutDoS exploitation or (ii) statistical coincidence in trafficsampling In the first case the attack is performed usinglegitimate and regular connections while in the second casethe collected samples coincide with legitimate trafficsignatures

In terms of resource use the system remained stableduring the experiment as shown in Figure 11(c) with smallswings in CPU usage

Finally the Smart Detection system was tested usingonline network traffic in four distinct scenarios (e resultspresented in Table 8 show that the system can distinguishlegitimate traffic from various types of DoSDDoS attackssuch as TCP flood UDP flood HTTP flood and HTTP slow

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

20 Mbps

40 Mbps

60 Mbps

80 Mbps

(a)

18000900 1000 1100 1200 1300 1400 1500 1600 17000 bps

500 Kbps

1 Mbps

15 Mbps

2 Mbps Normalactivity

Normal +slow DoS

Normal +high DoS

Normal +heartbleed DoS

(b)

Figure 10 Network traffic in online experiment using the CICIDS2017 dataset (a) Overall network traffic (b) Sampled network traffic

12 Security and Communication Networks

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

with significant rates of accuracy (e experiments alsohighlighted the importance of adjusting the Tmax and ETparameters(ese variables correlate with the network trafficsampling rate (SR) and directly influence the detection rateand system accuracy

431 Additional Comparison Compared with some recentsimilar works available in the literature the approach in-troduced in this work is quite competitive in terms of theevaluated performance metrics as shown in Table 9

(e comparison is not completely fair because the ex-perimental scenarios and data were slightly different but it issufficient to allow an evaluation of the obtained results Forinstance in the offline experiments performed with the CIC-DoS dataset in [16] DR was 7692 using an SR of 20while the proposed system obtained an online DR of 90 forthe attacks with a FAR of 18 using the same samplingtechnique In [37] a PREC of 821 was obtained using theCICIDS2017 dataset in the offline and unsampled analysisIn this work the proposed method obtained a PREC of999 which can be considered competitive for an onlinedetection system based on samples of network traffic Be-sides that in the CICIDS2017 dataset experiments where thelegitimate traffic rate is similar to that of attack trafficaccording to Figures 10(b) 11(a) and 11(b) the system wasalso able to distinguish malicious traffic from normal trafficsuch as studied in the lecture [34]

5 Conclusion

(is article has presented the Smart Detection system anonline approach to DoSDDoS attack detection (e soft-ware uses the Random Forest Tree algorithm to classifynetwork traffic based on samples taken by the sFlow protocoldirectly from network devices Several experiments were

performed to calibrate and evaluate system performanceResults showed that the proposed method is feasible andpresents improved performance when compared with somerecent and relevant approaches available in the literature

(e proposed system was evaluated based on three in-trusion detection benchmark datasets namely CIC-DoSCICIDS2017 and CSE-CIC-IDS2018 and was able toclassify various types of DoSDDoS attacks such as TCPflood UDP flood HTTP flood and HTTP slow Further-more the performance of the proposed method was com-pared against recent and related approaches Based on theexperimental results the Smart Detection approach deliversimproved DR FAR and PREC For example in the CIC-DoS and CSE-CIC-IDS2018 datasets the proposed systemacquired DR and PREC higher than 93 with FAR less than1 Although the system has achieved significant results inits scope it needs some improvements such as a better hitrate among attack classes and an automatic parametercalibration mechanism that maximizes the detection rate ofattacks

10000900 18001700160015001400130012001100

150

100

50

0

(a)

80

60

40

20

010000900 18001700160015001400130012001100

(b)

10000900 1800

CPU ideal timeCPU system time

1700160015001400130012001100

80100

604020

0

()

(c)

Figure 11 Traffic classification and CPU usage in online experiment using the CICIDS2017 dataset (a) Overall traffic rating (b) Malicioustraffic rating (c) Detection system CPU utilization

Table 8 Evaluation of system performance for all datasets

Dataset DR FAR PREC F1CIC-DoS 0936 00004 0999 0999CICIDS2017 0800 0002 0992 0992CSE-CIC-IDS2018 1000 0000 1000 1000Customized 0965 0002 0995 0995

Table 9 Comparison with research approaches of related works

Work Dataset DR FAR PREC[16] CIC-DoS 07690 NA NA[37] CICIDS2017 NA NA 08210(e proposed approach CIC-DoS 09360 00004 09990(e proposed approach CICIDS2017 08000 00020 09920

Security and Communication Networks 13

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

Future works include analysis of DDoS attacks based onthe vulnerabilities of services such as Heartbleed and webbrute force attack enhancement in the multiple-class clas-sification self-configuration of the system developingmethods for correlating triggered alarms and formulatingprotective measures

Data Availability

We produced a customized dataset and a variable selectionalgorithm and used four additional datasets to support thefindings of this study (e customized dataset used to supportthe findings of this study has been deposited in the IEEE DataPort repository (httpsdoiorg1024433CO0280398v2) (efeature selection algorithm used to support the findings of thisstudy has been deposited in the Code Ocean repository (httpsdoiorg1024433CO0280398v2) (e benchmark datasetsused to support the findings of this study have been deposited inthe Canadian Institute for Cybersecurity repository publiclyavailable as follows (1) the ISCXIDS2012 dataset (httpswwwunbcacicdatasetsidshtml) (2) the CIC-DoS dataset (httpswwwunbcacicdatasetsdos-datasethtml) (3) the CICIDS2017dataset (httpwwwunbcacicdatasetsIDS2017html) and (4)the CSE-CIC-IDS2018 dataset (httpswwwunbcacicdatasetsids-2018html)

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Funding

(is study was financed in part by the Coordenaccedilatildeo deAperfeiccediloamento de Pessoal de Nıvel Superior Brasil(CAPES) finance Code 001

Acknowledgments

(e authors would like to acknowledge the Digital Me-tropolis Institute (IMDUFRN) and High-PerformanceComputing Center of UFRN (NPADUFRN) for the overallsupport given to this work and the Canadian Institute ofCybersecurity (CICUNB) for publicly sharing the datasets

References

[1] D Anstee C F Chui P Bowen and G Sockrider WorldwideInfrastructure Security Report Arbor Networks Inc WestfordMA USA 2017

[2] A Al-Fuqaha M Guizani M Mohammadi M Aledhari andM Ayyash ldquoInternet of things a survey on enabling tech-nologies protocols and applicationsrdquo IEEE CommunicationsSurveys amp Tutorials vol 17 no 4 pp 2347ndash2376 2015

[3] S T Zargar J Joshi and D Tipper ldquoA survey of defensemechanisms against distributed denial of service (DDOS)flooding attacksrdquo IEEE Communications Surveys and Tuto-rials vol 15 no 4 pp 2046ndash2069 2013

[4] A Marzano D Alexander O Fonseca et al ldquo(e evolution ofbashlite and mirai IoT botnetsrdquo in Proceedings of the 2018

IEEE Symposium on Computers and Communications (ISCC)2018

[5] S Kottler ldquoFebruary 28th DDoS incident reportrdquo 2018httpsgithubblog2018-03-01-ddos-incident-report

[6] Y Cao Y Gao R Tan Q Han and Z Liu ldquoUnderstandinginternet DDoS mitigation from academic and industrialperspectivesrdquo IEEE Access vol 6 pp 66641ndash66648 2018

[7] R Roman J Zhou and J Lopez ldquoOn the features andchallenges of security and privacy in distributed internet ofthingsrdquo Computer Networks vol 57 no 10 pp 2266ndash22792013

[8] K Singh P Singh and K Kumar ldquoApplication layer HTTP-GET flood DDoS attacks research landscape and challengesrdquoComputers amp Security vol 65 pp 344ndash372 2017

[9] N Vlajic and D Zhou ldquoIoTas a land of opportunity for DDoShackersrdquo Computer vol 51 no 7 pp 26ndash34 2018

[10] S Newman ldquoUnder the radar the danger of stealthy DDoSattacksrdquo Network Security vol 2019 no 2 pp 18-19 2019

[11] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEECommunications Surveys amp Tutorials vol 16 no 1pp 303ndash336 2014

[12] Y Wang L Liu B Sun and Y Li ldquoA survey of defensemechanisms against distributed denial of service (DDoS)flooding attacksrdquo in Proceedings of the 2015 6th IEEE In-ternational Conference on Software Engineering and ServiceScience ICSESS 2015 pp 1034ndash1037 Kochi India 2015

[13] M Masdari and M Jalali A Survey and Taxonomy of DoSAttacks in Cloud Computing 2016 httponlinelibrarywileycomdoi101002sec1539epdf

[14] R Singh A Prasad R Moven and H Samra ldquoDenial ofservice attack in wireless data network a survey devices forintegrated circuitrdquo in Proceedings of the 2017 Devices forIntegrated Circuit (DevIC) pp 23-24 Kalyani India March2017

[15] A Praseed and P Santhi (ilagam ldquoDDoS attacks at theapplication layer challenges and research perspectives forsafeguarding web applicationsrdquo IEEE Communications Sur-veys and Tutorials vol 21 no 1 pp 661ndash685 2019

[16] H H Jazi H Gonzalez N Stakhanova and A A GhorbanildquoDetecting HTTP-based application layer DoS attacks on webservers in the presence of samplingrdquo Computer Networksvol 121 pp 25ndash36 2017

[17] H Kim T Benson A Akella andN Feamster ldquo(e evolution ofnetwork configuration a tale of two campusesrdquoin Proceedings ofthe 2011 ACM SIGCOMM Conference on Internet MeasurementConference IMC rsquo11 ACM p 499 New York NY USA 2011

[18] T V Phan and M Park ldquoEfficient distributed denial-of-service attack defense in sdn-based cloudrdquo IEEE Access vol 7pp 18701ndash18714 2019

[19] Y Gu K Li Z Guo and YWang ldquoSemi-supervised K-meansDDoS detection method using hybrid feature selection al-gorithmrdquo IEEE Access vol 7 pp 64351ndash64365 2019

[20] M Hasan M Milon Islam I Islam and M M A HashemldquoAttack and anomaly detection in IoT sensors in IoT sitesusing machine learning ApproachesrdquoInternet of (ingsvol 7 Article ID 100059 2016 httpslinkinghubelseviercomretrievepiiS2542660519300241

[21] N Chaabouni M Mosbah A Zemmari C Sauvignac andP Faruki ldquoNetwork intrusion detection for IoT securitybased on learning techniquesrdquo IEEE Communications Sur-veys amp Tutorials vol 21 no 3 pp 2671ndash2701 2019 httpsieeexploreieeeorgdocument8629941

14 Security and Communication Networks

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

[22] X Tan S Su Z Huang et al ldquoWireless sensor networksintrusion detection based on SMOTE and the random forestalgorithmrdquo Sensors (Switzerland) vol 19 no 1 2019

[23] J Cheng M Li X Tang V S Sheng Y Liu and W GuoldquoFlow correlation degree optimization driven random forestfor detecting DDoS attacks in cloud computingrdquo Security andCommunication Networks vol 2018 Article ID 645932614 pages 2018

[24] R Panigrahi and S Borah ldquoA detailed analysis ofCICIDS2017 dataset for designing intrusion detectionSystemsrdquo International Journal of Engineering amp Tech-nology vol 7 pp 479ndash482 December 2018 httpswwwresearchgatenetpublication329045441

[25] I Sharafaldin A Habibi Lashkari and A A GhorbanildquoToward generating a new intrusion detection dataset andintrusion traffic characterizationrdquo in Proceedings of the ICISSP2018mdash4th International Conference on Information SystemsSecurity and Privacy No Cic pp 108ndash116 Madeira Portugal2018

[26] R Kohavi ldquoA study of crossndashvalidation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe Fourteenth International Joint Conference on ArtificialIntelligence (II) pp 1137ndash1143 1995 httpswwwijcaiorgProceedings95-2Papers016pdf

[27] S Behal and K Kumar ldquoTrends in validation of DDoS re-searchrdquo Procedia Computer Science vol 85 pp 7ndash15 2016

[28] B Pfaff J Pettit T Koponen et al ldquo(e design andimplementation of Open vSwitchrdquo in 12th USENIX Sym-posium on Networked Systems Design and Implementation(NSDI 15) pp 117ndash130 USENIX Association OaklandCA USA 2015 httpswwwusenixorgconferencensdi15technical-sessionspresentationpfaff

[29] J Le R Giller Y Tatsumi H Huang J Ma and N MoriCase Study Developing DPDK-Accelerated Virtual SwitchSolutions for Cloud Service Providers Intel CorporationSanta Clara CA USA 2019 httpsbuildersintelcomdocscloudbuildersdeveloping-dpdk-accelerated-virtual-switch-solutions-for-cloud-service-providerspdf

[30] W Meng W Li C Su J Zhou and R Lu ldquoEnhancing trustmanagement for wireless intrusion detection via trafficsampling in the era of big datardquo IEEE Access vol 6pp 7234ndash7243 2017

[31] R Zuech T M Khoshgoftaar and R Wald ldquoIntrusion de-tection and big heterogeneous data a surveyrdquo Journal of BigData vol 2 no 1 2015

[32] R K C Chang ldquoDefending against flooding-based distributeddenial-of-service attacks a tutorialrdquo IEEE CommunicationsMagazine vol 40 no 10 pp 42ndash51 2002

[33] S Simpson S N Shirazi A Marnerides S Jouet D Pezarosand D Hutchison ldquoAn inter-domain collaboration scheme toremedy DDoS attacks in computer networksrdquo IEEE Trans-actions on Network and Service Management vol 15 no 3pp 879ndash893 2018

[34] S Behal K Kumar and M Sachdeva ldquoD-face an anomalybased distributed approach for early detection of DDoS at-tacks and flash eventsrdquo Journal of Network and ComputerApplications vol 111 pp 49ndash63 2018

[35] C Wang T T N Miu X Luo and J Wang ldquoSkyShield asketch-based defense system against application layer DDoSattacksrdquo IEEE Transactions on Information Forensics andSecurity vol 13 no 3 pp 559ndash573 2018

[36] Z Liu Y Cao M Zhu and W Ge ldquoUmbrella enabling ISPsto offer readily deployable and privacy-preserving DDoS

prevention servicesrdquo IEEE Transactions on Information Fo-rensics and Security vol 14 no 4 pp 1098ndash1108 2019

[37] M Aamir and SM A Zaidi ldquoClustering based semi-supervisedmachine learning for DDoS attack classificationrdquo Journal ofKing Saud UniversitymdashComputer and Information Sciences2019

[38] R E Jurga R E Jurga M M Hulb M M HulbH P Procurve and H P Procurve Technical Report PacketSampling for Network Monitoring 2007

[39] P Phaal and M Lavine sFlow version 5 InMon Corp SanFrancisco CA USA 1981 httpssfloworgsflow_version_5txt

[40] J Postel Internet Protocol RFC Editor 1981 httpswwwrfc-editororginforfc0791

[41] J Postel ldquoTransmission Control Protocolrdquo RFC Editor 1981httpswwwrfc-editororginforfc0793

[42] J Postel ldquoUser datagram protocolrdquo RFC Editor 1980 httpswwwrfc-editororginforfc768

[43] B Claise Ed Cisco systems netflow services export version 9RFC Editor 2004 httpswwwrfc-editororginforfc3954

[44] R J Hyndman and Y Fan ldquoSample quantiles in statisticalpackagesrdquo Statistical Computing vol 50 no 4 pp 361ndash3651996

[45] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers andSecurity vol 31 no 3 pp 357ndash374 2012

[46] S Sanfilippo N Jombart D Ducamp Y Berthier andS Aubert ldquoHpingrdquo september 2019 httpwwwhpingorg

[47] A Grafov ldquoHulk DoS toolrdquo 2015 httpsgithubcomgrafovhulk

[48] J Seidl ldquoGoldenEye HTTP DoS test toolrdquo 2012 httpsgithubcomjseidlGoldenEye

[49] S Shekyan ldquoSlowHTTPTestrdquo 2011 httpsgithubcomshekyanslowhttptest

[50] S Ganapathy K Kulothungan S MuthurajkumarM Vijayalakshmi P Yogesh and A Kannan ldquoIntelligent featureselection and classification techniques for intrusion detection innetworks a surveyrdquo EURASIP Journal on Wireless Communi-cations and Networking vol 2013 no 1 p 271 2013 httpjwcneurasipjournalscomcontent20131271

[51] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2012 httpdlacmorgcitationcfmid2078195

[52] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001

[53] A Turner ldquoTcpreplayndashPcap editing and replaying utilitiesrdquo2013 httpstcpreplayappnetacom

[54] S H Park J M Goo and C H Jo ldquoReceiver operatingcharacteristic (ROC) curve practical review for radiologistsrdquoKorean Journal of Radiology vol 5 no 1 p 11 2004

[55] M W Powers ldquoEvaluation from precision recall andF-measure to ROC informedness markedness and correla-tionrdquo Journal of Machine Learning Technologies vol 2 no 1pp 37ndash63 2011

Security and Communication Networks 15

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 16: Research Articledownloads.hindawi.com/journals/scn/2019/1574749.pdf · Research Article Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning Francisco

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom