blatta:earlyexploitdetectiononnetworktrafficwith...

15
Research Article BLATTA: Early Exploit Detection on Network Traffic with Recurrent Neural Networks Baskoro A. Pratomo , 1,2 Pete Burnap, 1 and George Theodorakopoulos 1 1 School of Computer Science and Informatics, Cardiff University, Cardiff, UK 2 Informatics Department, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia Correspondence should be addressed to Baskoro A. Pratomo; [email protected] Received 8 April 2020; Revised 25 June 2020; Accepted 9 July 2020; Published 4 August 2020 Academic Editor: Muhammad Faisal Amjad Copyright © 2020 Baskoro A. Pratomo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Detecting exploits is crucial since the effect of undetected ones can be devastating. Identifying their presence on the network allows us to respond and block their malicious payload before they cause damage to the system. Inspecting the payload of network traffic may offer better performance in detecting exploits as they tend to hide their presence and behave similarly to legitimate traffic. Previous works on deep packet inspection for detecting malicious traffic regularly read the full length of application layer messages. As the length varies, longer messages will take more time to analyse, during which time the attack creates a disruptive impact on the system. Hence, we propose a novel early exploit detection mechanism that scans network traffic, reading only 35.21% of application layer messages to predict malicious traffic while retaining a 97.57% detection rate and a 1.93% false positive rate. Our recurrent neural network- (RNN-) based model is the first work to our knowledge that provides early prediction of malicious application layer messages, thus detecting a potential attack earlier than other state-of-the-art approaches and enabling a form of early warning system. 1. Introduction Exploits are attacks on systems that take advantage of the existence of bugs and vulnerabilities. ey infiltrate the system by giving the system an input which triggers malicious be- haviour. As time passes, the number of bugs and vulnerabilities increases, along with the number of exploits. In the first quarter of 2019, there were 400,000 new exploits [1], while more than 16 million exploits have been released in total. Exploits exist in most operating systems (OSs); hence, detecting exploits early is crucial to minimise potential damage. By exploiting a vulnerability, attackers can, for example, gain access to remote systems, send a remote exploit, or escalate their privilege on a system. Exploits-DB [2] is a website that archives exploits, both remote and local ones. e number of existing remote exploits on the website is almost double that of the local ones, which suggests remote exploits are more prevalent. No physical access to the system is required to execute a remote exploit; thus, the attack can be launched from anywhere in the world. Remote exploits normally carry a piece of code as a payload, which will be executed once a vulnerability has been successfully exploited. An exploit is analogous to a tool for breaking into a house, and its payload is something the burglar would do once they are inside the house. Without this payload, exploits would be merely a tool to demonstrate that an application is vulnerable. Exploit payloads may take many forms; they could be written in machine code, a server- side scripting language (e.g., PHP, Python, and Ruby), or OS specific commands (e.g., Bash). One way to detect exploits is to scan network traffic for their presence. In doing this, the exploit can be detected before it arrives at the vulnerable system. If this is achieved, earlier action can be taken to minimise or nullify the damage. ere is also no need to run the exploit in a clone server or virtual machine (VM) to be analysed—as it is usually the case in host-based detection approaches, making this approach more time efficient to block and provide rapid response to attacks. erefore, detecting exploits in network traffic is a promising way to prevent remote exploits from infecting protected systems. Hindawi Security and Communication Networks Volume 2020, Article ID 8826038, 15 pages https://doi.org/10.1155/2020/8826038

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

Research ArticleBLATTA Early Exploit Detection on Network Traffic withRecurrent Neural Networks

Baskoro A Pratomo 12 Pete Burnap1 and George Theodorakopoulos1

1School of Computer Science and Informatics Cardiff University Cardiff UK2Informatics Department Institut Teknologi Sepuluh Nopember Surabaya Indonesia

Correspondence should be addressed to Baskoro A Pratomo mebaskoroadiwebid

Received 8 April 2020 Revised 25 June 2020 Accepted 9 July 2020 Published 4 August 2020

Academic Editor Muhammad Faisal Amjad

Copyright copy 2020 Baskoro A Pratomo et al is is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Detecting exploits is crucial since the effect of undetected ones can be devastating Identifying their presence on the network allowsus to respond and block their malicious payload before they cause damage to the system Inspecting the payload of network trafficmay offer better performance in detecting exploits as they tend to hide their presence and behave similarly to legitimate trafficPrevious works on deep packet inspection for detecting malicious traffic regularly read the full length of application layermessages As the length varies longer messages will take more time to analyse during which time the attack creates a disruptiveimpact on the system Hence we propose a novel early exploit detection mechanism that scans network traffic reading only3521 of application layer messages to predict malicious traffic while retaining a 9757 detection rate and a 193 false positiverate Our recurrent neural network- (RNN-) based model is the first work to our knowledge that provides early prediction ofmalicious application layer messages thus detecting a potential attack earlier than other state-of-the-art approaches and enablinga form of early warning system

1 Introduction

Exploits are attacks on systems that take advantage of theexistence of bugs and vulnerabilities ey infiltrate the systemby giving the system an input which triggers malicious be-haviour As time passes the number of bugs and vulnerabilitiesincreases along with the number of exploits In the first quarterof 2019 there were 400000 new exploits [1] while more than16 million exploits have been released in total Exploits exist inmost operating systems (OSs) hence detecting exploits early iscrucial to minimise potential damage

By exploiting a vulnerability attackers can for examplegain access to remote systems send a remote exploit orescalate their privilege on a system Exploits-DB [2] is awebsite that archives exploits both remote and local onese number of existing remote exploits on the website isalmost double that of the local ones which suggests remoteexploits are more prevalent No physical access to the systemis required to execute a remote exploit thus the attack canbe launched from anywhere in the world

Remote exploits normally carry a piece of code as apayload which will be executed once a vulnerability has beensuccessfully exploited An exploit is analogous to a tool forbreaking into a house and its payload is something theburglar would do once they are inside the house Withoutthis payload exploits would be merely a tool to demonstratethat an application is vulnerable Exploit payloads may takemany forms they could be written inmachine code a server-side scripting language (eg PHP Python and Ruby) or OSspecific commands (eg Bash)

One way to detect exploits is to scan network traffic for theirpresence In doing this the exploit can be detected before itarrives at the vulnerable system If this is achieved earlier actioncan be taken to minimise or nullify the damageere is also noneed to run the exploit in a clone server or virtualmachine (VM)to be analysedmdashas it is usually the case in host-based detectionapproaches making this approach more time efficient to blockand provide rapid response to attacks erefore detectingexploits in network traffic is a promising way to prevent remoteexploits from infecting protected systems

HindawiSecurity and Communication NetworksVolume 2020 Article ID 8826038 15 pageshttpsdoiorg10115520208826038

Detecting exploits on the wire has challenges firstlyprocessing the vast amount of data without decreasingnetwork throughput below acceptable levels quality ofservice is still a priority Secondly there are various ways toencode exploit payloads [3] by modifying the exploitpayload to make it appear different yet still achieve the samegoal is technique makes it easy to evade any rule-baseddetection Lastly encrypted traffic is also a challenge at-tackers may transmit the exploit with an encrypted protocoleg HTTPS

ere are many ways to detect exploits in network trafficRule-based detection systems work by matching signaturesof known attacks to the network traffic Anything thatmatches the rule is deemed malicious e most prevalentopen-source intrusion detection system Snort [4] has a rulethat marks any traffic which contains byte 0times 90 as shell-code-related traffic is rule is based on the knowledge thatmost x86-based shellcodes are preceded by a sequence of nooperation (NOP) instructions in which the bytes normallycontain this value However this rule can easily be evaded byemploying other NOP instructions such as the ldquo0times 410times 49rdquo sequence Apart from that rule-based detectionsystems are susceptible to zero-day attacks for which nodetection rule exists Such systems are unable to detect theseattacks until the rule database is updated with the new attacksignature

Machine learning (ML) algorithms are capable of clas-sifying objects and artefacts based on features exhibited indata and handle various modalities of input ML has beensuccessfully applied in many domains with a high successrate such as image classification natural language pro-cessing speech recognition and even intrusion detectionsystem ere has been much research on implementingmachine learning to address network intrusion detection [5]Researchers typically provide training examples of maliciousand legitimate traffic to the ML algorithm that can then beused to determine whether new (unseen) traffic is maliciousHowever there are three key limitations with existing re-search firstly most of the research uses old datasets egeither KDD99 or DARPA99 [6] e traffic in those datasetsmay not represent recent network traces since networkprotocols have evolved during these years as have the at-tacks Secondly many of the previous works focus solely onthe header information of network packets and processpackets individually [6] Yet it is known that exploits mayexhibit similar statistical attributes to legitimate traffic at aheader-level and use evasion techniques such as packetfragmentation to hide their existence [3] erefore weargue that network payload features may capture exploitsbetter and this area is still actively expanding as shown bythe number of research mentioned in Table 1is argumentbrings us to the third limitation existing methods that usepayload features ie byte frequencies or n-grams usuallyinvolve reading the payload of whole application layermessages (see Section 2)e issue is that these messages canbe lengthy and spread over multiple network packetsReading the whole messages before making a decision maylead to a delay in detecting the attack and gives the exploittime to execute before an alert is raised

We therefore propose Blatta an early exploit detec-tion system which reads application layer messages andpredicts whether these messages are likely to be maliciousby reading only the first few bytes is is the first work toour knowledge that provides early prediction of maliciousapplication layer messages thus detecting a potential at-tack earlier than other state-of-the-art approaches andenabling a form of early warning system Blatta utilises arecurrent neural network- (RNN-) based model andn-grams to make the prediction An RNN is a type ofartificial neural networks that takes a sequence-basedvector as inputmdashin this case a sequence of n-grams froman application layer messagemdashand considers the temporalbehaviour of the sequence so that they are not treatedindependently ere has been a limited amount of re-search on payload-based intrusion detection which usedan RNN or its further development (ie long short-termmemory [24 25] and gated recurrent unit [20]) but earlierresearch did not make predictive decisions early as theyonly used 1-grams of full-length application layer mes-sages as features which lacks contextual informationmdashakey benefit of higher-order n-grams Other work that doesconsider higher-order sequences of n-grams (eg [12 29])is also yet to develop methods that provide early-stageprediction preferring methods that require the full pay-load as input to a classifier

To evaluate the proposed system we generated an exploittraffic dataset by running exploits in Metasploit [30] withvarious exploit payloads and encoders An exploit payload isa piece of code that will be executed once the exploit hassuccessfully infiltrated the system An encoder is used tochange how the exploit payload appears while keeping itsfunctionality Our dataset contains traffic from 5687working exploits Apart from that we also used a morerecent intrusion detection dataset UNSW-NB15 [31] thusenabling our method to be compared with previous works

To summarise the key contributions of this paper are asfollows

(i) Proposed an early prediction of exploit traffic Blattausing a novel approach of using an RNN and high-order n-grams to make predictive decisions whilestill considering temporal behaviour of a sequence ofn-grams Blatta is the first to the best of ourknowledge who introduces early exploit detectionBlatta detects exploit payloads as they enter theprotected network and is able to predict the exploitby reading 3521 of application layer messages onaverage Blatta thus enables actions to be taken toblock the attack earlier and therefore reduce theharm to the system

(ii) Generated a dataset of exploit traffic with variousexploit payloads and encoders resulting in 5687unique connections e exploit traffic in the datasetwas ensured to contain actual exploit code makingthe dataset closer to reality

e rest of this paper is structured as follows Section 2gives a summary of previous works in this area Section 3

2 Security and Communication Networks

explains the datasets we used to test our proposed methodHow Blatta works is explained in Section 4 en Section 5explains our extensive experimentation with Blatta We alsodiscuss possible evasion techniques to our approach inSection 6 Finally the paper concludes in Section 7

2 Related Works

e earliest solution to detecting exploit activities usedpattern matching and regular expression [4] Rules aredefined by system administrators and are then applied to the

Table 1 Related works in exploit detection Unlike previous works Blatta does not have to read until the end of application layer messagesto detect exploit traffic

Paper Features Detection method Dataset(s) Learningtype Protocol(s) Early

prediction

PAYL [7] Relative frequency count of each1-gram

Based on statistical modeland Mahalanobis distance D SG U HTTP SMTP

SSH No

RePIDS [8]

Mahalanobis distance map whichis originated from relative

frequency count of each 1-gramfiltered by PCA

Based on statistical modeland Mahalanobis distance D M U HTTP No

McPAD [9] 2v-grams Multi one-class SVMclassifier D M U HTTP No

HMMPayl [10] Byte sequences of the L7 payload Ensemble of HMMs D M DI U HTTP No

Oza et al [11] Relative frequency count of each1-gram Based on statistical model D M SG U HTTP No

OCPAD [12] High-order n-grams (ngt 1)Based on the occurrence

probability of an n-grams in apacket

M SG U HTTP No

Bartos et al[13]

Information from HTTP requestheaders and the lengths SVM SG S HTTP No

Zhang et al[14]

Packet header information andHTTP and DNS messages

Naıve Bayes Bayesiannetwork SVM D SG S DNS HTTP No

Decanter [15] HTTP messages Clustering SG U HTTP No

Golait andHubbali [16] Byte sequence of the L7 payload

Probabilistic countingdeterministic timed

automataSG U SIP No

Duessel et al[17]

Contextual n-grams of the L7payload One-class SVM SG U HTTP RPC No

Min et al [18] Words of the L7 payload CNN and random forest I S HTTP No

Jin et al [19] 2v-grams Multi one-class SVMclassifier M U HTTP No

Hao et al [20] Byte sequence of the L7 payload Variant gated recurrent unit I S HTTP NoSchneider andBottinger [21] Byte sequence of the L7 payload Stacked autoencoder O U Modbus No

Hamed et al[22]

n-grams of base64-encodedpayload SVM I S All protocols in

the datasets No

Pratomo et al[23]

Byte frequency of application layermessages

Outlier detection with deepautoencoder SW U HTTP SMTP No

Qin et al [24] Byte sequence of the L7 payload Using a recurrent neuralnetwork O S HTTP No

Liu et al [25] Byte sequence of the L7 payloadUsing a recurrent neuralnetwork with embedded

vectorsD O S HTTP No

Zhao and Ahn[26]

Disassembled instructions of bytesin network traffic

Employing Markov chain-based model and SVM SG S Not mentioned No

Shabtai et al[27]

n-grams of a file and n-grams ofopcodes in a file then calculated

TFIDF of those n-grams

Various ML algorithm egrandom forest decision treeNaıve Bayes and few others

SG S Fileclassification No

SigFree [28] Disassembled instructions of bytesin application layer payload

Analyses of instructionsequences to determine if

they are codeSG Non-ML HTTP No

Proposedapproach

High-order n-grams of applicationlayer messages

Uses of recurrent neuralnetwork to early predict

exploit trafficSW SG S HTTP FTP Yes

DDARPA99 MMcPAD attacks dataset [9] I ISCX 2012 SG self-generated DIDIEE SWUNSW-NB15 O others U unsupervisedS supervised non-MLnon-machine learning approach

Security and Communication Networks 3

network traffic to identify a matchWhen amatching patternis found the system raises an alert and possibly shows whichpart of the traffic matches the rule e disadvantage of thisapproach is that the rules must be kept up to date Anyvariation to the exploit payload which is done by usingencoders may defeat such detection

ML approaches are capable of recognising previouslyseen instances of objects and such methods aim to gener-alise to unseen objects ML is able to learn patterns or be-haviour from features present in training data and classifywhich class an unseen object belongs to However to worksuitable hand-crafted features must be engineered for theML algorithm to make an accurate prediction Some pre-vious works defined their feature set by extracting infor-mation from application layer message In [14] and [15] theauthors generated their features from HTTP request URIand HTTP headers ie host constant header fields sizeuser-agent and language e authors then clustered thelegitimate traffic based on those features Golait and Hub-balli [16] tracked DNS and HTTP traffic and calculatedpairwise features of two events to see their similarity esefeatures were obtained from the transport and applicationlayer One of their features is the semantical similarity be-tween two HTTP requests

Features derived from a specific protocol may capturespecific behaviour of legitimate traffic However this featureextraction method has a drawback A different set of featuresis needed for every application layer protocol that might beused in the network Some research borrows the featureextraction method from natural language processingproblems to have protocol-agnostic features using n-grams

One of the first leading research in payload analysis isPAYL [7] It extracts 1-grams from all bytes of the payload asa representation of the network traffic and is trained over aset of those 1 grams PAYL [7] measures the distance be-tween the new incoming traffic with the model with asimplified Mahalanobis distance Similar to PAYL Oza et al[11] extracts n-grams of HTTP traffic with various n valuesey compared three different statistical models to detectanomaliesattacks HMMPayl [10] is another work based onPAYL which uses Hidden Markov Models to detectanomalies OCPAD [12] stores the n-grams of bytes in aprobability tree and uses one-class Naıve Bayes classifier todetect malicious traffic Hao et al [20] proposed anautoencoder model to detect anomalous low-rate attacks onthe network and Schneider and Bottinger [21] used stackedautoencoders to detect attacks on industrial control systemsHamed et al [22] developed probabilistic counting deter-ministic timed automata which inspects byte values of ap-plication layer messages to identify attacks on VOIPapplications Pratomo et al [23] extract words from theapplication layer message and detect web-based attacks witha combination of convolutional neural network (CNN) andrandom forest A common approach that is found on all ofthe aforementioned works is they read all bytes in eachpacket or application layer message and do not decide untilall bytes have been read at which point it is possible theexploit has already infected the system and targeted thevulnerability

Other research studies argue that exploit traffic is mostlikely to contain shellcode a short sequence of machinecode erefore to get a better representation of exploittraffic they performed static analysis on the network trafficQin et al [24] disassemble byte sequences to opcode se-quences and calculate probabilities of opcode transition todetect shellcode presence Liu et al [25] utilise n-grams thatcomprise machine instructions instead of bytes SigFree [28]detects buffer overflow attempts on Internet services byconstructing an instruction flow graph for each request andanalysing the graph to determine if the request containsmachine instructions Any network traffic that contains validmachine instructions is deemed malicious However noneof these consider that an exploit may also contain server-sidescripting language or OS commands instead of shellcode

In this paper we proposed Blatta an exploit attackdetection system that (i) provides early detection of exploitsin network traffic rather than waiting until the whole pay-load has been delivered enabling proactive blocking ofattacks and (ii) is capable of detecting payloads that includemalicious server-side scripting languages machine in-structions and OS commands enhancing the previous state-of-the-art approach that only focuses on shellcode esummary of the proposedmethod and previous works is alsoshown in Table 1

Blatta utilises a recurrent neural network- (RNN-) basedmodel which takes a sequence of n-grams from networkpayload as the input ere has been limited research ondeveloping RNNs to analyse payload data [20 24 25] All ofthese works feed individual bytes (1 grams) from the payloadas the features to their RNNmodel However 1 grams do notcarry information about context of the payload string as asequence of activities erefore we model the payload ashigh-order n-grams where ngt 1 as to capture more con-textual information about the network payload We directlycompare 1-grams to higher-order n-grams in our experi-ments Moreover while related works such as [19] [17] andOCPAD [12] previously used high-order n-grams they didnot utilise a model capable of learning sequences of activitiesand thus not capable of making early-stage predictionswithin a sequence Our novel RNN-based model will con-sider the long-term dependency of the sequence of n-grams

An RNN-based model normally takes a sequence asinput processes each item sequentially and outputs thedecision after it has finished processing the last item of thesequence e earlier works which used the RNN has thisbehaviour [24 25]While for Blatta to be able to early predictthe exploit traffic it takes the intermediate output of theRNN-based model not waiting for full-length message to beprocessed Our experiments show that this approach haslittle effect to accuracy and enables us to make earliernetwork attack predictions while retaining high accuracyand a low false positive rate

3 Datasets and Threat Model

Several datasets have been used to evaluate network-basedintrusion detection systems DARPA released the KDD99Cup IDSEVAL 98 and IDSEVAL 99 datasets [32] ey

4 Security and Communication Networks

have been widely used over time despite some criticism thatit does not model the attacks correctly [33] e LawrenceBerkeley National Laboratory released their anonymisedcaptured traffic [34] (LBNL05) More recently Shiravi et alreleased the ISCX12 dataset [35] in which they collectedseven-day worth of traffic and provided the label for eachconnection in XML format Moustafa and Slay published theUNSW-NB15 dataset [31] which had been generated byusing the IXIA PerfectStorm tool for generating a hybrid ofreal modern normal activities and synthetic contemporaryattack behaviours is dataset provides PCAP files alongwith the preprocessed traffic obtained by Bro-IDS [36] andArgus [37]

Moustafa and Slay [31] captured the network traffic intwo days on 22 January and 17 February 2015 For brevitywe refer to those two parts of UNSW-NB15 as UNSW-JANand UNSW-FEB respectively Both days contain maliciousand benign traffic erefore there has to be an effort toseparate them if we would like to use the raw informationfrom the PCAP files not the preprocessed informationwritten in the CSV files e advantage of this dataset overISCX12 is that UNSW-NB15 contains information on thetype of attacks and thus we are able to select which kind ofattacks are needed ie exploits and worms However afteranalysing this dataset in depth we observed that the exploittraffic in the dataset is often barely distinguishable from thenormal traffic as some of them do not contain any exploitpayload Our explanation for this is that exploit attemptsmay not have been successful thus they did not get to thepoint where the actual exploit payload was transmitted andrecorded in PCAP files erefore we opted to generate ourexploit traffic dataset the BlattaSploit dataset

31 BlattaSploit Dataset To develop an exploit trafficdataset we set up two virtual machines that acted as vul-nerable servers to be attacked e first server was installedwith Metasploitable 2 [38] a vulnerable OS designed to beinfiltrated by Metasploit [30] e second one was installedwith Debian 5 vulnerable services and WordPress 4118Both servers were set up in the victim subnet while theattacker machine was placed in a different subnet esesubnets were connected with a routere router was used tocapture the traffic as depicted in Figure 1 Although thenetwork topology is less complex than what we would havein the real-world this setup still generates representativedata as the payloads are intact regardless of the number ofhops the packets go through

To launch exploits on the vulnerable servers we utilisedMetasploit an exploitation tool which is normally used forpenetration testing [30] Metasploit has a collection ofworking exploits and is updated regularly with new exploitsere are around 9000 exploits for Linux- and Unix-basedapplications and to make the dataset more diverse we alsoemployed different exploit payloads and encoders An ex-ploit payload is a piece of code which is intended to beexecuted on the remote machine and an encoder is atechnique to modify the appearance of particular exploitcode to avoid signature-based detection Each exploit in

Metasploit has its own set of compatible exploit payloadsand encoders In this paper we used all possible combi-nations of exploit payloads and encoders

We then ran the exploits against the vulnerable serversand captured the traffic using tcpdump Traffic generated byeach exploit was stored in an individual PCAP file By doingso we know which specific type of exploit the packetsbelonged to Timestamps are normally used to mark whichpackets belong to a class but this information would not bereliable since packets may not come in order and if there ismore than one source sending traffic at a time their packetmay get mixed up with other sources

When analysing the PCAP files we found out that not allexploits had run successfully Some of them failed to sendanything to the targeted servers and some others did notsend any malicious traffic eg sending login request onlyand sending requests for nonexistent files is supportedour earlier thoughts on why the UNSW-NB15 dataset waslacking exploit payload traffic erefore we removedcapture files that had no traffic or too little information to bedistinguished from normal traffic In the end we produced5687 PCAP files which also represent the number of distinctsets of exploit exploit payloads and encoders Since we areinterested in application layer messages all PCAP files werepreprocessed with tcpflow [39] to obtain the applicationlayer message for each TCP connectione summary of thisdataset is shown in Table 2

e next step for this exploit traffic dataset was to an-notate the traffic All samples can be considered malicioushowever we decided to make the dataset more detailed byadding the type of exploit payload contained inside thetraffic and the location of the exploit payloadere are eighttypes of exploit payload in this dataset ey are written inJavaScript PHP Perl Python Ruby Shell script (eg Bashand ZSH) SQL and byte codeopcode for shellcode-basedexploits

ere are some cases where an exploit contains an ex-ploit payload ldquowrappedrdquo in another scripting language Forexample a Python script to do reverse shell connectionwhich uses the Bash echo command at the beginning Forthese cases the type of the exploit payload is the one with thelongest byte sequence In this case the type of the particularconnection is Python

It is also important to note that whilst the vulnerableservers in our setup use a 7-year-old operating system thepayload carried by the exploit was the identical payload to amore recent exploit would have used For example bothCVE-2019-9670 (disclosed in 2019) and CVE-2012-1495(disclosed in 2012) can use genericshell_bind_tcp as apayload e traffic generated by both exploits will still besimilar erefore we argue that our dataset still representthe current attacks Moreover only successful exploit attacksare kept in the dataset making the recorded traffic morerealistic

32reatModel A threat model is a list of potential thingsthat may affect protected systems Having one means we canidentify which part is the focus of our proposed approach

Security and Communication Networks 5

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 2: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

Detecting exploits on the wire has challenges firstlyprocessing the vast amount of data without decreasingnetwork throughput below acceptable levels quality ofservice is still a priority Secondly there are various ways toencode exploit payloads [3] by modifying the exploitpayload to make it appear different yet still achieve the samegoal is technique makes it easy to evade any rule-baseddetection Lastly encrypted traffic is also a challenge at-tackers may transmit the exploit with an encrypted protocoleg HTTPS

ere are many ways to detect exploits in network trafficRule-based detection systems work by matching signaturesof known attacks to the network traffic Anything thatmatches the rule is deemed malicious e most prevalentopen-source intrusion detection system Snort [4] has a rulethat marks any traffic which contains byte 0times 90 as shell-code-related traffic is rule is based on the knowledge thatmost x86-based shellcodes are preceded by a sequence of nooperation (NOP) instructions in which the bytes normallycontain this value However this rule can easily be evaded byemploying other NOP instructions such as the ldquo0times 410times 49rdquo sequence Apart from that rule-based detectionsystems are susceptible to zero-day attacks for which nodetection rule exists Such systems are unable to detect theseattacks until the rule database is updated with the new attacksignature

Machine learning (ML) algorithms are capable of clas-sifying objects and artefacts based on features exhibited indata and handle various modalities of input ML has beensuccessfully applied in many domains with a high successrate such as image classification natural language pro-cessing speech recognition and even intrusion detectionsystem ere has been much research on implementingmachine learning to address network intrusion detection [5]Researchers typically provide training examples of maliciousand legitimate traffic to the ML algorithm that can then beused to determine whether new (unseen) traffic is maliciousHowever there are three key limitations with existing re-search firstly most of the research uses old datasets egeither KDD99 or DARPA99 [6] e traffic in those datasetsmay not represent recent network traces since networkprotocols have evolved during these years as have the at-tacks Secondly many of the previous works focus solely onthe header information of network packets and processpackets individually [6] Yet it is known that exploits mayexhibit similar statistical attributes to legitimate traffic at aheader-level and use evasion techniques such as packetfragmentation to hide their existence [3] erefore weargue that network payload features may capture exploitsbetter and this area is still actively expanding as shown bythe number of research mentioned in Table 1is argumentbrings us to the third limitation existing methods that usepayload features ie byte frequencies or n-grams usuallyinvolve reading the payload of whole application layermessages (see Section 2)e issue is that these messages canbe lengthy and spread over multiple network packetsReading the whole messages before making a decision maylead to a delay in detecting the attack and gives the exploittime to execute before an alert is raised

We therefore propose Blatta an early exploit detec-tion system which reads application layer messages andpredicts whether these messages are likely to be maliciousby reading only the first few bytes is is the first work toour knowledge that provides early prediction of maliciousapplication layer messages thus detecting a potential at-tack earlier than other state-of-the-art approaches andenabling a form of early warning system Blatta utilises arecurrent neural network- (RNN-) based model andn-grams to make the prediction An RNN is a type ofartificial neural networks that takes a sequence-basedvector as inputmdashin this case a sequence of n-grams froman application layer messagemdashand considers the temporalbehaviour of the sequence so that they are not treatedindependently ere has been a limited amount of re-search on payload-based intrusion detection which usedan RNN or its further development (ie long short-termmemory [24 25] and gated recurrent unit [20]) but earlierresearch did not make predictive decisions early as theyonly used 1-grams of full-length application layer mes-sages as features which lacks contextual informationmdashakey benefit of higher-order n-grams Other work that doesconsider higher-order sequences of n-grams (eg [12 29])is also yet to develop methods that provide early-stageprediction preferring methods that require the full pay-load as input to a classifier

To evaluate the proposed system we generated an exploittraffic dataset by running exploits in Metasploit [30] withvarious exploit payloads and encoders An exploit payload isa piece of code that will be executed once the exploit hassuccessfully infiltrated the system An encoder is used tochange how the exploit payload appears while keeping itsfunctionality Our dataset contains traffic from 5687working exploits Apart from that we also used a morerecent intrusion detection dataset UNSW-NB15 [31] thusenabling our method to be compared with previous works

To summarise the key contributions of this paper are asfollows

(i) Proposed an early prediction of exploit traffic Blattausing a novel approach of using an RNN and high-order n-grams to make predictive decisions whilestill considering temporal behaviour of a sequence ofn-grams Blatta is the first to the best of ourknowledge who introduces early exploit detectionBlatta detects exploit payloads as they enter theprotected network and is able to predict the exploitby reading 3521 of application layer messages onaverage Blatta thus enables actions to be taken toblock the attack earlier and therefore reduce theharm to the system

(ii) Generated a dataset of exploit traffic with variousexploit payloads and encoders resulting in 5687unique connections e exploit traffic in the datasetwas ensured to contain actual exploit code makingthe dataset closer to reality

e rest of this paper is structured as follows Section 2gives a summary of previous works in this area Section 3

2 Security and Communication Networks

explains the datasets we used to test our proposed methodHow Blatta works is explained in Section 4 en Section 5explains our extensive experimentation with Blatta We alsodiscuss possible evasion techniques to our approach inSection 6 Finally the paper concludes in Section 7

2 Related Works

e earliest solution to detecting exploit activities usedpattern matching and regular expression [4] Rules aredefined by system administrators and are then applied to the

Table 1 Related works in exploit detection Unlike previous works Blatta does not have to read until the end of application layer messagesto detect exploit traffic

Paper Features Detection method Dataset(s) Learningtype Protocol(s) Early

prediction

PAYL [7] Relative frequency count of each1-gram

Based on statistical modeland Mahalanobis distance D SG U HTTP SMTP

SSH No

RePIDS [8]

Mahalanobis distance map whichis originated from relative

frequency count of each 1-gramfiltered by PCA

Based on statistical modeland Mahalanobis distance D M U HTTP No

McPAD [9] 2v-grams Multi one-class SVMclassifier D M U HTTP No

HMMPayl [10] Byte sequences of the L7 payload Ensemble of HMMs D M DI U HTTP No

Oza et al [11] Relative frequency count of each1-gram Based on statistical model D M SG U HTTP No

OCPAD [12] High-order n-grams (ngt 1)Based on the occurrence

probability of an n-grams in apacket

M SG U HTTP No

Bartos et al[13]

Information from HTTP requestheaders and the lengths SVM SG S HTTP No

Zhang et al[14]

Packet header information andHTTP and DNS messages

Naıve Bayes Bayesiannetwork SVM D SG S DNS HTTP No

Decanter [15] HTTP messages Clustering SG U HTTP No

Golait andHubbali [16] Byte sequence of the L7 payload

Probabilistic countingdeterministic timed

automataSG U SIP No

Duessel et al[17]

Contextual n-grams of the L7payload One-class SVM SG U HTTP RPC No

Min et al [18] Words of the L7 payload CNN and random forest I S HTTP No

Jin et al [19] 2v-grams Multi one-class SVMclassifier M U HTTP No

Hao et al [20] Byte sequence of the L7 payload Variant gated recurrent unit I S HTTP NoSchneider andBottinger [21] Byte sequence of the L7 payload Stacked autoencoder O U Modbus No

Hamed et al[22]

n-grams of base64-encodedpayload SVM I S All protocols in

the datasets No

Pratomo et al[23]

Byte frequency of application layermessages

Outlier detection with deepautoencoder SW U HTTP SMTP No

Qin et al [24] Byte sequence of the L7 payload Using a recurrent neuralnetwork O S HTTP No

Liu et al [25] Byte sequence of the L7 payloadUsing a recurrent neuralnetwork with embedded

vectorsD O S HTTP No

Zhao and Ahn[26]

Disassembled instructions of bytesin network traffic

Employing Markov chain-based model and SVM SG S Not mentioned No

Shabtai et al[27]

n-grams of a file and n-grams ofopcodes in a file then calculated

TFIDF of those n-grams

Various ML algorithm egrandom forest decision treeNaıve Bayes and few others

SG S Fileclassification No

SigFree [28] Disassembled instructions of bytesin application layer payload

Analyses of instructionsequences to determine if

they are codeSG Non-ML HTTP No

Proposedapproach

High-order n-grams of applicationlayer messages

Uses of recurrent neuralnetwork to early predict

exploit trafficSW SG S HTTP FTP Yes

DDARPA99 MMcPAD attacks dataset [9] I ISCX 2012 SG self-generated DIDIEE SWUNSW-NB15 O others U unsupervisedS supervised non-MLnon-machine learning approach

Security and Communication Networks 3

network traffic to identify a matchWhen amatching patternis found the system raises an alert and possibly shows whichpart of the traffic matches the rule e disadvantage of thisapproach is that the rules must be kept up to date Anyvariation to the exploit payload which is done by usingencoders may defeat such detection

ML approaches are capable of recognising previouslyseen instances of objects and such methods aim to gener-alise to unseen objects ML is able to learn patterns or be-haviour from features present in training data and classifywhich class an unseen object belongs to However to worksuitable hand-crafted features must be engineered for theML algorithm to make an accurate prediction Some pre-vious works defined their feature set by extracting infor-mation from application layer message In [14] and [15] theauthors generated their features from HTTP request URIand HTTP headers ie host constant header fields sizeuser-agent and language e authors then clustered thelegitimate traffic based on those features Golait and Hub-balli [16] tracked DNS and HTTP traffic and calculatedpairwise features of two events to see their similarity esefeatures were obtained from the transport and applicationlayer One of their features is the semantical similarity be-tween two HTTP requests

Features derived from a specific protocol may capturespecific behaviour of legitimate traffic However this featureextraction method has a drawback A different set of featuresis needed for every application layer protocol that might beused in the network Some research borrows the featureextraction method from natural language processingproblems to have protocol-agnostic features using n-grams

One of the first leading research in payload analysis isPAYL [7] It extracts 1-grams from all bytes of the payload asa representation of the network traffic and is trained over aset of those 1 grams PAYL [7] measures the distance be-tween the new incoming traffic with the model with asimplified Mahalanobis distance Similar to PAYL Oza et al[11] extracts n-grams of HTTP traffic with various n valuesey compared three different statistical models to detectanomaliesattacks HMMPayl [10] is another work based onPAYL which uses Hidden Markov Models to detectanomalies OCPAD [12] stores the n-grams of bytes in aprobability tree and uses one-class Naıve Bayes classifier todetect malicious traffic Hao et al [20] proposed anautoencoder model to detect anomalous low-rate attacks onthe network and Schneider and Bottinger [21] used stackedautoencoders to detect attacks on industrial control systemsHamed et al [22] developed probabilistic counting deter-ministic timed automata which inspects byte values of ap-plication layer messages to identify attacks on VOIPapplications Pratomo et al [23] extract words from theapplication layer message and detect web-based attacks witha combination of convolutional neural network (CNN) andrandom forest A common approach that is found on all ofthe aforementioned works is they read all bytes in eachpacket or application layer message and do not decide untilall bytes have been read at which point it is possible theexploit has already infected the system and targeted thevulnerability

Other research studies argue that exploit traffic is mostlikely to contain shellcode a short sequence of machinecode erefore to get a better representation of exploittraffic they performed static analysis on the network trafficQin et al [24] disassemble byte sequences to opcode se-quences and calculate probabilities of opcode transition todetect shellcode presence Liu et al [25] utilise n-grams thatcomprise machine instructions instead of bytes SigFree [28]detects buffer overflow attempts on Internet services byconstructing an instruction flow graph for each request andanalysing the graph to determine if the request containsmachine instructions Any network traffic that contains validmachine instructions is deemed malicious However noneof these consider that an exploit may also contain server-sidescripting language or OS commands instead of shellcode

In this paper we proposed Blatta an exploit attackdetection system that (i) provides early detection of exploitsin network traffic rather than waiting until the whole pay-load has been delivered enabling proactive blocking ofattacks and (ii) is capable of detecting payloads that includemalicious server-side scripting languages machine in-structions and OS commands enhancing the previous state-of-the-art approach that only focuses on shellcode esummary of the proposedmethod and previous works is alsoshown in Table 1

Blatta utilises a recurrent neural network- (RNN-) basedmodel which takes a sequence of n-grams from networkpayload as the input ere has been limited research ondeveloping RNNs to analyse payload data [20 24 25] All ofthese works feed individual bytes (1 grams) from the payloadas the features to their RNNmodel However 1 grams do notcarry information about context of the payload string as asequence of activities erefore we model the payload ashigh-order n-grams where ngt 1 as to capture more con-textual information about the network payload We directlycompare 1-grams to higher-order n-grams in our experi-ments Moreover while related works such as [19] [17] andOCPAD [12] previously used high-order n-grams they didnot utilise a model capable of learning sequences of activitiesand thus not capable of making early-stage predictionswithin a sequence Our novel RNN-based model will con-sider the long-term dependency of the sequence of n-grams

An RNN-based model normally takes a sequence asinput processes each item sequentially and outputs thedecision after it has finished processing the last item of thesequence e earlier works which used the RNN has thisbehaviour [24 25]While for Blatta to be able to early predictthe exploit traffic it takes the intermediate output of theRNN-based model not waiting for full-length message to beprocessed Our experiments show that this approach haslittle effect to accuracy and enables us to make earliernetwork attack predictions while retaining high accuracyand a low false positive rate

3 Datasets and Threat Model

Several datasets have been used to evaluate network-basedintrusion detection systems DARPA released the KDD99Cup IDSEVAL 98 and IDSEVAL 99 datasets [32] ey

4 Security and Communication Networks

have been widely used over time despite some criticism thatit does not model the attacks correctly [33] e LawrenceBerkeley National Laboratory released their anonymisedcaptured traffic [34] (LBNL05) More recently Shiravi et alreleased the ISCX12 dataset [35] in which they collectedseven-day worth of traffic and provided the label for eachconnection in XML format Moustafa and Slay published theUNSW-NB15 dataset [31] which had been generated byusing the IXIA PerfectStorm tool for generating a hybrid ofreal modern normal activities and synthetic contemporaryattack behaviours is dataset provides PCAP files alongwith the preprocessed traffic obtained by Bro-IDS [36] andArgus [37]

Moustafa and Slay [31] captured the network traffic intwo days on 22 January and 17 February 2015 For brevitywe refer to those two parts of UNSW-NB15 as UNSW-JANand UNSW-FEB respectively Both days contain maliciousand benign traffic erefore there has to be an effort toseparate them if we would like to use the raw informationfrom the PCAP files not the preprocessed informationwritten in the CSV files e advantage of this dataset overISCX12 is that UNSW-NB15 contains information on thetype of attacks and thus we are able to select which kind ofattacks are needed ie exploits and worms However afteranalysing this dataset in depth we observed that the exploittraffic in the dataset is often barely distinguishable from thenormal traffic as some of them do not contain any exploitpayload Our explanation for this is that exploit attemptsmay not have been successful thus they did not get to thepoint where the actual exploit payload was transmitted andrecorded in PCAP files erefore we opted to generate ourexploit traffic dataset the BlattaSploit dataset

31 BlattaSploit Dataset To develop an exploit trafficdataset we set up two virtual machines that acted as vul-nerable servers to be attacked e first server was installedwith Metasploitable 2 [38] a vulnerable OS designed to beinfiltrated by Metasploit [30] e second one was installedwith Debian 5 vulnerable services and WordPress 4118Both servers were set up in the victim subnet while theattacker machine was placed in a different subnet esesubnets were connected with a routere router was used tocapture the traffic as depicted in Figure 1 Although thenetwork topology is less complex than what we would havein the real-world this setup still generates representativedata as the payloads are intact regardless of the number ofhops the packets go through

To launch exploits on the vulnerable servers we utilisedMetasploit an exploitation tool which is normally used forpenetration testing [30] Metasploit has a collection ofworking exploits and is updated regularly with new exploitsere are around 9000 exploits for Linux- and Unix-basedapplications and to make the dataset more diverse we alsoemployed different exploit payloads and encoders An ex-ploit payload is a piece of code which is intended to beexecuted on the remote machine and an encoder is atechnique to modify the appearance of particular exploitcode to avoid signature-based detection Each exploit in

Metasploit has its own set of compatible exploit payloadsand encoders In this paper we used all possible combi-nations of exploit payloads and encoders

We then ran the exploits against the vulnerable serversand captured the traffic using tcpdump Traffic generated byeach exploit was stored in an individual PCAP file By doingso we know which specific type of exploit the packetsbelonged to Timestamps are normally used to mark whichpackets belong to a class but this information would not bereliable since packets may not come in order and if there ismore than one source sending traffic at a time their packetmay get mixed up with other sources

When analysing the PCAP files we found out that not allexploits had run successfully Some of them failed to sendanything to the targeted servers and some others did notsend any malicious traffic eg sending login request onlyand sending requests for nonexistent files is supportedour earlier thoughts on why the UNSW-NB15 dataset waslacking exploit payload traffic erefore we removedcapture files that had no traffic or too little information to bedistinguished from normal traffic In the end we produced5687 PCAP files which also represent the number of distinctsets of exploit exploit payloads and encoders Since we areinterested in application layer messages all PCAP files werepreprocessed with tcpflow [39] to obtain the applicationlayer message for each TCP connectione summary of thisdataset is shown in Table 2

e next step for this exploit traffic dataset was to an-notate the traffic All samples can be considered malicioushowever we decided to make the dataset more detailed byadding the type of exploit payload contained inside thetraffic and the location of the exploit payloadere are eighttypes of exploit payload in this dataset ey are written inJavaScript PHP Perl Python Ruby Shell script (eg Bashand ZSH) SQL and byte codeopcode for shellcode-basedexploits

ere are some cases where an exploit contains an ex-ploit payload ldquowrappedrdquo in another scripting language Forexample a Python script to do reverse shell connectionwhich uses the Bash echo command at the beginning Forthese cases the type of the exploit payload is the one with thelongest byte sequence In this case the type of the particularconnection is Python

It is also important to note that whilst the vulnerableservers in our setup use a 7-year-old operating system thepayload carried by the exploit was the identical payload to amore recent exploit would have used For example bothCVE-2019-9670 (disclosed in 2019) and CVE-2012-1495(disclosed in 2012) can use genericshell_bind_tcp as apayload e traffic generated by both exploits will still besimilar erefore we argue that our dataset still representthe current attacks Moreover only successful exploit attacksare kept in the dataset making the recorded traffic morerealistic

32reatModel A threat model is a list of potential thingsthat may affect protected systems Having one means we canidentify which part is the focus of our proposed approach

Security and Communication Networks 5

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 3: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

explains the datasets we used to test our proposed methodHow Blatta works is explained in Section 4 en Section 5explains our extensive experimentation with Blatta We alsodiscuss possible evasion techniques to our approach inSection 6 Finally the paper concludes in Section 7

2 Related Works

e earliest solution to detecting exploit activities usedpattern matching and regular expression [4] Rules aredefined by system administrators and are then applied to the

Table 1 Related works in exploit detection Unlike previous works Blatta does not have to read until the end of application layer messagesto detect exploit traffic

Paper Features Detection method Dataset(s) Learningtype Protocol(s) Early

prediction

PAYL [7] Relative frequency count of each1-gram

Based on statistical modeland Mahalanobis distance D SG U HTTP SMTP

SSH No

RePIDS [8]

Mahalanobis distance map whichis originated from relative

frequency count of each 1-gramfiltered by PCA

Based on statistical modeland Mahalanobis distance D M U HTTP No

McPAD [9] 2v-grams Multi one-class SVMclassifier D M U HTTP No

HMMPayl [10] Byte sequences of the L7 payload Ensemble of HMMs D M DI U HTTP No

Oza et al [11] Relative frequency count of each1-gram Based on statistical model D M SG U HTTP No

OCPAD [12] High-order n-grams (ngt 1)Based on the occurrence

probability of an n-grams in apacket

M SG U HTTP No

Bartos et al[13]

Information from HTTP requestheaders and the lengths SVM SG S HTTP No

Zhang et al[14]

Packet header information andHTTP and DNS messages

Naıve Bayes Bayesiannetwork SVM D SG S DNS HTTP No

Decanter [15] HTTP messages Clustering SG U HTTP No

Golait andHubbali [16] Byte sequence of the L7 payload

Probabilistic countingdeterministic timed

automataSG U SIP No

Duessel et al[17]

Contextual n-grams of the L7payload One-class SVM SG U HTTP RPC No

Min et al [18] Words of the L7 payload CNN and random forest I S HTTP No

Jin et al [19] 2v-grams Multi one-class SVMclassifier M U HTTP No

Hao et al [20] Byte sequence of the L7 payload Variant gated recurrent unit I S HTTP NoSchneider andBottinger [21] Byte sequence of the L7 payload Stacked autoencoder O U Modbus No

Hamed et al[22]

n-grams of base64-encodedpayload SVM I S All protocols in

the datasets No

Pratomo et al[23]

Byte frequency of application layermessages

Outlier detection with deepautoencoder SW U HTTP SMTP No

Qin et al [24] Byte sequence of the L7 payload Using a recurrent neuralnetwork O S HTTP No

Liu et al [25] Byte sequence of the L7 payloadUsing a recurrent neuralnetwork with embedded

vectorsD O S HTTP No

Zhao and Ahn[26]

Disassembled instructions of bytesin network traffic

Employing Markov chain-based model and SVM SG S Not mentioned No

Shabtai et al[27]

n-grams of a file and n-grams ofopcodes in a file then calculated

TFIDF of those n-grams

Various ML algorithm egrandom forest decision treeNaıve Bayes and few others

SG S Fileclassification No

SigFree [28] Disassembled instructions of bytesin application layer payload

Analyses of instructionsequences to determine if

they are codeSG Non-ML HTTP No

Proposedapproach

High-order n-grams of applicationlayer messages

Uses of recurrent neuralnetwork to early predict

exploit trafficSW SG S HTTP FTP Yes

DDARPA99 MMcPAD attacks dataset [9] I ISCX 2012 SG self-generated DIDIEE SWUNSW-NB15 O others U unsupervisedS supervised non-MLnon-machine learning approach

Security and Communication Networks 3

network traffic to identify a matchWhen amatching patternis found the system raises an alert and possibly shows whichpart of the traffic matches the rule e disadvantage of thisapproach is that the rules must be kept up to date Anyvariation to the exploit payload which is done by usingencoders may defeat such detection

ML approaches are capable of recognising previouslyseen instances of objects and such methods aim to gener-alise to unseen objects ML is able to learn patterns or be-haviour from features present in training data and classifywhich class an unseen object belongs to However to worksuitable hand-crafted features must be engineered for theML algorithm to make an accurate prediction Some pre-vious works defined their feature set by extracting infor-mation from application layer message In [14] and [15] theauthors generated their features from HTTP request URIand HTTP headers ie host constant header fields sizeuser-agent and language e authors then clustered thelegitimate traffic based on those features Golait and Hub-balli [16] tracked DNS and HTTP traffic and calculatedpairwise features of two events to see their similarity esefeatures were obtained from the transport and applicationlayer One of their features is the semantical similarity be-tween two HTTP requests

Features derived from a specific protocol may capturespecific behaviour of legitimate traffic However this featureextraction method has a drawback A different set of featuresis needed for every application layer protocol that might beused in the network Some research borrows the featureextraction method from natural language processingproblems to have protocol-agnostic features using n-grams

One of the first leading research in payload analysis isPAYL [7] It extracts 1-grams from all bytes of the payload asa representation of the network traffic and is trained over aset of those 1 grams PAYL [7] measures the distance be-tween the new incoming traffic with the model with asimplified Mahalanobis distance Similar to PAYL Oza et al[11] extracts n-grams of HTTP traffic with various n valuesey compared three different statistical models to detectanomaliesattacks HMMPayl [10] is another work based onPAYL which uses Hidden Markov Models to detectanomalies OCPAD [12] stores the n-grams of bytes in aprobability tree and uses one-class Naıve Bayes classifier todetect malicious traffic Hao et al [20] proposed anautoencoder model to detect anomalous low-rate attacks onthe network and Schneider and Bottinger [21] used stackedautoencoders to detect attacks on industrial control systemsHamed et al [22] developed probabilistic counting deter-ministic timed automata which inspects byte values of ap-plication layer messages to identify attacks on VOIPapplications Pratomo et al [23] extract words from theapplication layer message and detect web-based attacks witha combination of convolutional neural network (CNN) andrandom forest A common approach that is found on all ofthe aforementioned works is they read all bytes in eachpacket or application layer message and do not decide untilall bytes have been read at which point it is possible theexploit has already infected the system and targeted thevulnerability

Other research studies argue that exploit traffic is mostlikely to contain shellcode a short sequence of machinecode erefore to get a better representation of exploittraffic they performed static analysis on the network trafficQin et al [24] disassemble byte sequences to opcode se-quences and calculate probabilities of opcode transition todetect shellcode presence Liu et al [25] utilise n-grams thatcomprise machine instructions instead of bytes SigFree [28]detects buffer overflow attempts on Internet services byconstructing an instruction flow graph for each request andanalysing the graph to determine if the request containsmachine instructions Any network traffic that contains validmachine instructions is deemed malicious However noneof these consider that an exploit may also contain server-sidescripting language or OS commands instead of shellcode

In this paper we proposed Blatta an exploit attackdetection system that (i) provides early detection of exploitsin network traffic rather than waiting until the whole pay-load has been delivered enabling proactive blocking ofattacks and (ii) is capable of detecting payloads that includemalicious server-side scripting languages machine in-structions and OS commands enhancing the previous state-of-the-art approach that only focuses on shellcode esummary of the proposedmethod and previous works is alsoshown in Table 1

Blatta utilises a recurrent neural network- (RNN-) basedmodel which takes a sequence of n-grams from networkpayload as the input ere has been limited research ondeveloping RNNs to analyse payload data [20 24 25] All ofthese works feed individual bytes (1 grams) from the payloadas the features to their RNNmodel However 1 grams do notcarry information about context of the payload string as asequence of activities erefore we model the payload ashigh-order n-grams where ngt 1 as to capture more con-textual information about the network payload We directlycompare 1-grams to higher-order n-grams in our experi-ments Moreover while related works such as [19] [17] andOCPAD [12] previously used high-order n-grams they didnot utilise a model capable of learning sequences of activitiesand thus not capable of making early-stage predictionswithin a sequence Our novel RNN-based model will con-sider the long-term dependency of the sequence of n-grams

An RNN-based model normally takes a sequence asinput processes each item sequentially and outputs thedecision after it has finished processing the last item of thesequence e earlier works which used the RNN has thisbehaviour [24 25]While for Blatta to be able to early predictthe exploit traffic it takes the intermediate output of theRNN-based model not waiting for full-length message to beprocessed Our experiments show that this approach haslittle effect to accuracy and enables us to make earliernetwork attack predictions while retaining high accuracyand a low false positive rate

3 Datasets and Threat Model

Several datasets have been used to evaluate network-basedintrusion detection systems DARPA released the KDD99Cup IDSEVAL 98 and IDSEVAL 99 datasets [32] ey

4 Security and Communication Networks

have been widely used over time despite some criticism thatit does not model the attacks correctly [33] e LawrenceBerkeley National Laboratory released their anonymisedcaptured traffic [34] (LBNL05) More recently Shiravi et alreleased the ISCX12 dataset [35] in which they collectedseven-day worth of traffic and provided the label for eachconnection in XML format Moustafa and Slay published theUNSW-NB15 dataset [31] which had been generated byusing the IXIA PerfectStorm tool for generating a hybrid ofreal modern normal activities and synthetic contemporaryattack behaviours is dataset provides PCAP files alongwith the preprocessed traffic obtained by Bro-IDS [36] andArgus [37]

Moustafa and Slay [31] captured the network traffic intwo days on 22 January and 17 February 2015 For brevitywe refer to those two parts of UNSW-NB15 as UNSW-JANand UNSW-FEB respectively Both days contain maliciousand benign traffic erefore there has to be an effort toseparate them if we would like to use the raw informationfrom the PCAP files not the preprocessed informationwritten in the CSV files e advantage of this dataset overISCX12 is that UNSW-NB15 contains information on thetype of attacks and thus we are able to select which kind ofattacks are needed ie exploits and worms However afteranalysing this dataset in depth we observed that the exploittraffic in the dataset is often barely distinguishable from thenormal traffic as some of them do not contain any exploitpayload Our explanation for this is that exploit attemptsmay not have been successful thus they did not get to thepoint where the actual exploit payload was transmitted andrecorded in PCAP files erefore we opted to generate ourexploit traffic dataset the BlattaSploit dataset

31 BlattaSploit Dataset To develop an exploit trafficdataset we set up two virtual machines that acted as vul-nerable servers to be attacked e first server was installedwith Metasploitable 2 [38] a vulnerable OS designed to beinfiltrated by Metasploit [30] e second one was installedwith Debian 5 vulnerable services and WordPress 4118Both servers were set up in the victim subnet while theattacker machine was placed in a different subnet esesubnets were connected with a routere router was used tocapture the traffic as depicted in Figure 1 Although thenetwork topology is less complex than what we would havein the real-world this setup still generates representativedata as the payloads are intact regardless of the number ofhops the packets go through

To launch exploits on the vulnerable servers we utilisedMetasploit an exploitation tool which is normally used forpenetration testing [30] Metasploit has a collection ofworking exploits and is updated regularly with new exploitsere are around 9000 exploits for Linux- and Unix-basedapplications and to make the dataset more diverse we alsoemployed different exploit payloads and encoders An ex-ploit payload is a piece of code which is intended to beexecuted on the remote machine and an encoder is atechnique to modify the appearance of particular exploitcode to avoid signature-based detection Each exploit in

Metasploit has its own set of compatible exploit payloadsand encoders In this paper we used all possible combi-nations of exploit payloads and encoders

We then ran the exploits against the vulnerable serversand captured the traffic using tcpdump Traffic generated byeach exploit was stored in an individual PCAP file By doingso we know which specific type of exploit the packetsbelonged to Timestamps are normally used to mark whichpackets belong to a class but this information would not bereliable since packets may not come in order and if there ismore than one source sending traffic at a time their packetmay get mixed up with other sources

When analysing the PCAP files we found out that not allexploits had run successfully Some of them failed to sendanything to the targeted servers and some others did notsend any malicious traffic eg sending login request onlyand sending requests for nonexistent files is supportedour earlier thoughts on why the UNSW-NB15 dataset waslacking exploit payload traffic erefore we removedcapture files that had no traffic or too little information to bedistinguished from normal traffic In the end we produced5687 PCAP files which also represent the number of distinctsets of exploit exploit payloads and encoders Since we areinterested in application layer messages all PCAP files werepreprocessed with tcpflow [39] to obtain the applicationlayer message for each TCP connectione summary of thisdataset is shown in Table 2

e next step for this exploit traffic dataset was to an-notate the traffic All samples can be considered malicioushowever we decided to make the dataset more detailed byadding the type of exploit payload contained inside thetraffic and the location of the exploit payloadere are eighttypes of exploit payload in this dataset ey are written inJavaScript PHP Perl Python Ruby Shell script (eg Bashand ZSH) SQL and byte codeopcode for shellcode-basedexploits

ere are some cases where an exploit contains an ex-ploit payload ldquowrappedrdquo in another scripting language Forexample a Python script to do reverse shell connectionwhich uses the Bash echo command at the beginning Forthese cases the type of the exploit payload is the one with thelongest byte sequence In this case the type of the particularconnection is Python

It is also important to note that whilst the vulnerableservers in our setup use a 7-year-old operating system thepayload carried by the exploit was the identical payload to amore recent exploit would have used For example bothCVE-2019-9670 (disclosed in 2019) and CVE-2012-1495(disclosed in 2012) can use genericshell_bind_tcp as apayload e traffic generated by both exploits will still besimilar erefore we argue that our dataset still representthe current attacks Moreover only successful exploit attacksare kept in the dataset making the recorded traffic morerealistic

32reatModel A threat model is a list of potential thingsthat may affect protected systems Having one means we canidentify which part is the focus of our proposed approach

Security and Communication Networks 5

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 4: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

network traffic to identify a matchWhen amatching patternis found the system raises an alert and possibly shows whichpart of the traffic matches the rule e disadvantage of thisapproach is that the rules must be kept up to date Anyvariation to the exploit payload which is done by usingencoders may defeat such detection

ML approaches are capable of recognising previouslyseen instances of objects and such methods aim to gener-alise to unseen objects ML is able to learn patterns or be-haviour from features present in training data and classifywhich class an unseen object belongs to However to worksuitable hand-crafted features must be engineered for theML algorithm to make an accurate prediction Some pre-vious works defined their feature set by extracting infor-mation from application layer message In [14] and [15] theauthors generated their features from HTTP request URIand HTTP headers ie host constant header fields sizeuser-agent and language e authors then clustered thelegitimate traffic based on those features Golait and Hub-balli [16] tracked DNS and HTTP traffic and calculatedpairwise features of two events to see their similarity esefeatures were obtained from the transport and applicationlayer One of their features is the semantical similarity be-tween two HTTP requests

Features derived from a specific protocol may capturespecific behaviour of legitimate traffic However this featureextraction method has a drawback A different set of featuresis needed for every application layer protocol that might beused in the network Some research borrows the featureextraction method from natural language processingproblems to have protocol-agnostic features using n-grams

One of the first leading research in payload analysis isPAYL [7] It extracts 1-grams from all bytes of the payload asa representation of the network traffic and is trained over aset of those 1 grams PAYL [7] measures the distance be-tween the new incoming traffic with the model with asimplified Mahalanobis distance Similar to PAYL Oza et al[11] extracts n-grams of HTTP traffic with various n valuesey compared three different statistical models to detectanomaliesattacks HMMPayl [10] is another work based onPAYL which uses Hidden Markov Models to detectanomalies OCPAD [12] stores the n-grams of bytes in aprobability tree and uses one-class Naıve Bayes classifier todetect malicious traffic Hao et al [20] proposed anautoencoder model to detect anomalous low-rate attacks onthe network and Schneider and Bottinger [21] used stackedautoencoders to detect attacks on industrial control systemsHamed et al [22] developed probabilistic counting deter-ministic timed automata which inspects byte values of ap-plication layer messages to identify attacks on VOIPapplications Pratomo et al [23] extract words from theapplication layer message and detect web-based attacks witha combination of convolutional neural network (CNN) andrandom forest A common approach that is found on all ofthe aforementioned works is they read all bytes in eachpacket or application layer message and do not decide untilall bytes have been read at which point it is possible theexploit has already infected the system and targeted thevulnerability

Other research studies argue that exploit traffic is mostlikely to contain shellcode a short sequence of machinecode erefore to get a better representation of exploittraffic they performed static analysis on the network trafficQin et al [24] disassemble byte sequences to opcode se-quences and calculate probabilities of opcode transition todetect shellcode presence Liu et al [25] utilise n-grams thatcomprise machine instructions instead of bytes SigFree [28]detects buffer overflow attempts on Internet services byconstructing an instruction flow graph for each request andanalysing the graph to determine if the request containsmachine instructions Any network traffic that contains validmachine instructions is deemed malicious However noneof these consider that an exploit may also contain server-sidescripting language or OS commands instead of shellcode

In this paper we proposed Blatta an exploit attackdetection system that (i) provides early detection of exploitsin network traffic rather than waiting until the whole pay-load has been delivered enabling proactive blocking ofattacks and (ii) is capable of detecting payloads that includemalicious server-side scripting languages machine in-structions and OS commands enhancing the previous state-of-the-art approach that only focuses on shellcode esummary of the proposedmethod and previous works is alsoshown in Table 1

Blatta utilises a recurrent neural network- (RNN-) basedmodel which takes a sequence of n-grams from networkpayload as the input ere has been limited research ondeveloping RNNs to analyse payload data [20 24 25] All ofthese works feed individual bytes (1 grams) from the payloadas the features to their RNNmodel However 1 grams do notcarry information about context of the payload string as asequence of activities erefore we model the payload ashigh-order n-grams where ngt 1 as to capture more con-textual information about the network payload We directlycompare 1-grams to higher-order n-grams in our experi-ments Moreover while related works such as [19] [17] andOCPAD [12] previously used high-order n-grams they didnot utilise a model capable of learning sequences of activitiesand thus not capable of making early-stage predictionswithin a sequence Our novel RNN-based model will con-sider the long-term dependency of the sequence of n-grams

An RNN-based model normally takes a sequence asinput processes each item sequentially and outputs thedecision after it has finished processing the last item of thesequence e earlier works which used the RNN has thisbehaviour [24 25]While for Blatta to be able to early predictthe exploit traffic it takes the intermediate output of theRNN-based model not waiting for full-length message to beprocessed Our experiments show that this approach haslittle effect to accuracy and enables us to make earliernetwork attack predictions while retaining high accuracyand a low false positive rate

3 Datasets and Threat Model

Several datasets have been used to evaluate network-basedintrusion detection systems DARPA released the KDD99Cup IDSEVAL 98 and IDSEVAL 99 datasets [32] ey

4 Security and Communication Networks

have been widely used over time despite some criticism thatit does not model the attacks correctly [33] e LawrenceBerkeley National Laboratory released their anonymisedcaptured traffic [34] (LBNL05) More recently Shiravi et alreleased the ISCX12 dataset [35] in which they collectedseven-day worth of traffic and provided the label for eachconnection in XML format Moustafa and Slay published theUNSW-NB15 dataset [31] which had been generated byusing the IXIA PerfectStorm tool for generating a hybrid ofreal modern normal activities and synthetic contemporaryattack behaviours is dataset provides PCAP files alongwith the preprocessed traffic obtained by Bro-IDS [36] andArgus [37]

Moustafa and Slay [31] captured the network traffic intwo days on 22 January and 17 February 2015 For brevitywe refer to those two parts of UNSW-NB15 as UNSW-JANand UNSW-FEB respectively Both days contain maliciousand benign traffic erefore there has to be an effort toseparate them if we would like to use the raw informationfrom the PCAP files not the preprocessed informationwritten in the CSV files e advantage of this dataset overISCX12 is that UNSW-NB15 contains information on thetype of attacks and thus we are able to select which kind ofattacks are needed ie exploits and worms However afteranalysing this dataset in depth we observed that the exploittraffic in the dataset is often barely distinguishable from thenormal traffic as some of them do not contain any exploitpayload Our explanation for this is that exploit attemptsmay not have been successful thus they did not get to thepoint where the actual exploit payload was transmitted andrecorded in PCAP files erefore we opted to generate ourexploit traffic dataset the BlattaSploit dataset

31 BlattaSploit Dataset To develop an exploit trafficdataset we set up two virtual machines that acted as vul-nerable servers to be attacked e first server was installedwith Metasploitable 2 [38] a vulnerable OS designed to beinfiltrated by Metasploit [30] e second one was installedwith Debian 5 vulnerable services and WordPress 4118Both servers were set up in the victim subnet while theattacker machine was placed in a different subnet esesubnets were connected with a routere router was used tocapture the traffic as depicted in Figure 1 Although thenetwork topology is less complex than what we would havein the real-world this setup still generates representativedata as the payloads are intact regardless of the number ofhops the packets go through

To launch exploits on the vulnerable servers we utilisedMetasploit an exploitation tool which is normally used forpenetration testing [30] Metasploit has a collection ofworking exploits and is updated regularly with new exploitsere are around 9000 exploits for Linux- and Unix-basedapplications and to make the dataset more diverse we alsoemployed different exploit payloads and encoders An ex-ploit payload is a piece of code which is intended to beexecuted on the remote machine and an encoder is atechnique to modify the appearance of particular exploitcode to avoid signature-based detection Each exploit in

Metasploit has its own set of compatible exploit payloadsand encoders In this paper we used all possible combi-nations of exploit payloads and encoders

We then ran the exploits against the vulnerable serversand captured the traffic using tcpdump Traffic generated byeach exploit was stored in an individual PCAP file By doingso we know which specific type of exploit the packetsbelonged to Timestamps are normally used to mark whichpackets belong to a class but this information would not bereliable since packets may not come in order and if there ismore than one source sending traffic at a time their packetmay get mixed up with other sources

When analysing the PCAP files we found out that not allexploits had run successfully Some of them failed to sendanything to the targeted servers and some others did notsend any malicious traffic eg sending login request onlyand sending requests for nonexistent files is supportedour earlier thoughts on why the UNSW-NB15 dataset waslacking exploit payload traffic erefore we removedcapture files that had no traffic or too little information to bedistinguished from normal traffic In the end we produced5687 PCAP files which also represent the number of distinctsets of exploit exploit payloads and encoders Since we areinterested in application layer messages all PCAP files werepreprocessed with tcpflow [39] to obtain the applicationlayer message for each TCP connectione summary of thisdataset is shown in Table 2

e next step for this exploit traffic dataset was to an-notate the traffic All samples can be considered malicioushowever we decided to make the dataset more detailed byadding the type of exploit payload contained inside thetraffic and the location of the exploit payloadere are eighttypes of exploit payload in this dataset ey are written inJavaScript PHP Perl Python Ruby Shell script (eg Bashand ZSH) SQL and byte codeopcode for shellcode-basedexploits

ere are some cases where an exploit contains an ex-ploit payload ldquowrappedrdquo in another scripting language Forexample a Python script to do reverse shell connectionwhich uses the Bash echo command at the beginning Forthese cases the type of the exploit payload is the one with thelongest byte sequence In this case the type of the particularconnection is Python

It is also important to note that whilst the vulnerableservers in our setup use a 7-year-old operating system thepayload carried by the exploit was the identical payload to amore recent exploit would have used For example bothCVE-2019-9670 (disclosed in 2019) and CVE-2012-1495(disclosed in 2012) can use genericshell_bind_tcp as apayload e traffic generated by both exploits will still besimilar erefore we argue that our dataset still representthe current attacks Moreover only successful exploit attacksare kept in the dataset making the recorded traffic morerealistic

32reatModel A threat model is a list of potential thingsthat may affect protected systems Having one means we canidentify which part is the focus of our proposed approach

Security and Communication Networks 5

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 5: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

have been widely used over time despite some criticism thatit does not model the attacks correctly [33] e LawrenceBerkeley National Laboratory released their anonymisedcaptured traffic [34] (LBNL05) More recently Shiravi et alreleased the ISCX12 dataset [35] in which they collectedseven-day worth of traffic and provided the label for eachconnection in XML format Moustafa and Slay published theUNSW-NB15 dataset [31] which had been generated byusing the IXIA PerfectStorm tool for generating a hybrid ofreal modern normal activities and synthetic contemporaryattack behaviours is dataset provides PCAP files alongwith the preprocessed traffic obtained by Bro-IDS [36] andArgus [37]

Moustafa and Slay [31] captured the network traffic intwo days on 22 January and 17 February 2015 For brevitywe refer to those two parts of UNSW-NB15 as UNSW-JANand UNSW-FEB respectively Both days contain maliciousand benign traffic erefore there has to be an effort toseparate them if we would like to use the raw informationfrom the PCAP files not the preprocessed informationwritten in the CSV files e advantage of this dataset overISCX12 is that UNSW-NB15 contains information on thetype of attacks and thus we are able to select which kind ofattacks are needed ie exploits and worms However afteranalysing this dataset in depth we observed that the exploittraffic in the dataset is often barely distinguishable from thenormal traffic as some of them do not contain any exploitpayload Our explanation for this is that exploit attemptsmay not have been successful thus they did not get to thepoint where the actual exploit payload was transmitted andrecorded in PCAP files erefore we opted to generate ourexploit traffic dataset the BlattaSploit dataset

31 BlattaSploit Dataset To develop an exploit trafficdataset we set up two virtual machines that acted as vul-nerable servers to be attacked e first server was installedwith Metasploitable 2 [38] a vulnerable OS designed to beinfiltrated by Metasploit [30] e second one was installedwith Debian 5 vulnerable services and WordPress 4118Both servers were set up in the victim subnet while theattacker machine was placed in a different subnet esesubnets were connected with a routere router was used tocapture the traffic as depicted in Figure 1 Although thenetwork topology is less complex than what we would havein the real-world this setup still generates representativedata as the payloads are intact regardless of the number ofhops the packets go through

To launch exploits on the vulnerable servers we utilisedMetasploit an exploitation tool which is normally used forpenetration testing [30] Metasploit has a collection ofworking exploits and is updated regularly with new exploitsere are around 9000 exploits for Linux- and Unix-basedapplications and to make the dataset more diverse we alsoemployed different exploit payloads and encoders An ex-ploit payload is a piece of code which is intended to beexecuted on the remote machine and an encoder is atechnique to modify the appearance of particular exploitcode to avoid signature-based detection Each exploit in

Metasploit has its own set of compatible exploit payloadsand encoders In this paper we used all possible combi-nations of exploit payloads and encoders

We then ran the exploits against the vulnerable serversand captured the traffic using tcpdump Traffic generated byeach exploit was stored in an individual PCAP file By doingso we know which specific type of exploit the packetsbelonged to Timestamps are normally used to mark whichpackets belong to a class but this information would not bereliable since packets may not come in order and if there ismore than one source sending traffic at a time their packetmay get mixed up with other sources

When analysing the PCAP files we found out that not allexploits had run successfully Some of them failed to sendanything to the targeted servers and some others did notsend any malicious traffic eg sending login request onlyand sending requests for nonexistent files is supportedour earlier thoughts on why the UNSW-NB15 dataset waslacking exploit payload traffic erefore we removedcapture files that had no traffic or too little information to bedistinguished from normal traffic In the end we produced5687 PCAP files which also represent the number of distinctsets of exploit exploit payloads and encoders Since we areinterested in application layer messages all PCAP files werepreprocessed with tcpflow [39] to obtain the applicationlayer message for each TCP connectione summary of thisdataset is shown in Table 2

e next step for this exploit traffic dataset was to an-notate the traffic All samples can be considered malicioushowever we decided to make the dataset more detailed byadding the type of exploit payload contained inside thetraffic and the location of the exploit payloadere are eighttypes of exploit payload in this dataset ey are written inJavaScript PHP Perl Python Ruby Shell script (eg Bashand ZSH) SQL and byte codeopcode for shellcode-basedexploits

ere are some cases where an exploit contains an ex-ploit payload ldquowrappedrdquo in another scripting language Forexample a Python script to do reverse shell connectionwhich uses the Bash echo command at the beginning Forthese cases the type of the exploit payload is the one with thelongest byte sequence In this case the type of the particularconnection is Python

It is also important to note that whilst the vulnerableservers in our setup use a 7-year-old operating system thepayload carried by the exploit was the identical payload to amore recent exploit would have used For example bothCVE-2019-9670 (disclosed in 2019) and CVE-2012-1495(disclosed in 2012) can use genericshell_bind_tcp as apayload e traffic generated by both exploits will still besimilar erefore we argue that our dataset still representthe current attacks Moreover only successful exploit attacksare kept in the dataset making the recorded traffic morerealistic

32reatModel A threat model is a list of potential thingsthat may affect protected systems Having one means we canidentify which part is the focus of our proposed approach

Security and Communication Networks 5

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 6: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

thus in this case we can potentially understand better whatto look for in order to detect the malicious traffic and whatthe limitations are

e proposed method focuses on detecting remote ex-ploits by reading application layer messages from theunencrypted network traffic although the detection methodof Blatta can be incorporated with application layer firewallsie web application firewalls erefore we can still detectthe exploit attempt before it reaches the protected appli-cation In general the type of attacks we consider here are asfollows

(1) Remote exploits to servers that send maliciousscripting languages (eg PHP Ruby Python orSQL) shellcode or Bash scripts to maintain controlto the server or gained access to it remotely Forexample the apache_continuum_cmd_exec exploitwith reverse shell payload will force the targetedserver to open a connection to the attacking com-puter and provide shell access to the attacker Byfocusing on the connections directed to servers wecan safely assume JavaScript code in the applicationlayer message could also be malicious since normallyJavaScript code is sent from the server to the clientnot the other way around

(2) Exploit attacks that utilise one of the text-basedprotocols over TCP ie HTTP and FTP Text-basedprotocols tend to be more well-structured thereforewe can apply natural language processing basedapproach e case would be similar to documentclassification

(3) Other attacks that may utilise remote exploits arealso considered ie worms Worms in the UNSW-NB15 dataset contain exploit code used to propagatethemselves

4 Methodology

Extracting features from application layer messages is thefirst step toward an early prediction method We could usevariable values in the message (eg HTTP header valuesFTP commands and SMTP header values) but it wouldrequire us to extract a different set of features from eachapplication layer protocol It is preferable to have a genericfeature set which applies to various application layer pro-tocolserefore we proposed amethod by using n-grams tomodel the application layer message as they carry moreinformation than a sequence of bytes For instance it wouldbe difficult for a model to determine what a sequence of bytesldquoGrdquo ldquoErdquo ldquoTrdquo space ldquordquo and so on means as those individualbytes may appear anywhere in a different order However ifthe model has 3-grams of the sequence (ie ldquoGETrdquo ldquoETrdquoldquoTrdquo and so on) the model would learn something moremeaningful such as that the string could be an HTTPmessage e advantage of using high-order n-grams wherengt 1 is also shown by Anagram [29] method in [40] andOCPAD [12] However these works did not consider thetemporal behaviour of a sequence of n-grams as their modelwas not capable of doing that erefore Blatta utilised arecurrent neural network (RNN) to analyse the sequence ofn-grams which were obtained from an application layermessage

Attacker VM192168666

Switch Router Switch

Metasploitable VM192168999

Debian VM192168996

Figure 1 Network topology for generating exploit traffic Attacker VM running Metasploit and target VMs are placed in different networkconnected by a router is router is used to capture all traffic from these virtual machines

Table 2 A summary of exploits captured in the BlattaSploit datasetNumber of TCP connections 5687Protocols HTTP (3857) FTP (6) SMTP (74) POP3 (93)Payload types JavaScript shellcode Perl PHP Python Ruby Bash SQLNote e numbers next to the protocols are the number of connections in the application layer protocols

6 Security and Communication Networks

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 7: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

An RNN takes a sequence of inputs and processes themsequentially in several time steps enabling the model tolearn the temporal behaviour of those inputs In this case theinput to each time step is an n-gram unlike earlier workswhich also utilised an RNN model but took a byte value asthe input to each RNN time step [24 25] Moreover theseworks took the output from the last time step to makedecision while our novel approach produces classificationoutputs at intermediate intervals as the RNN model is al-ready confident about the decision We argue that thisapproach will enable the proposed system to predict whethera connection is malicious without reading the full length ofapplication layer messages therefore providing an earlywarning method

In general as shown in Figure 2 the training and de-tection process of Blatta are as follows

41 Training Stage n-grams are extracted from applicationlayer messages l most common n-grams are stored in adictionaryis dictionary is used to encode an n-gram to aninteger e encoded n-grams are then fed into an RNNmodel training the model to classify whether the traffic islegitimate or malicious

42 Detection Stage For each new incoming TCP connec-tion directed to the server we reconstruct the applicationlayer message obtain a full-length or partial bytes of themand determine if the sequence belongs to malicious traffic

43 Data Preprocessing In well-structured documents suchas text-based application layer protocols byte sequences canbe a distinguishing feature that makes each message in theirrespective class differ from each other Blatta takes the bytesequence of application layer messages from network traffice application layer messages need to be reconstructed asthe message may be split into multiple TCP segments andtransmitted in an arbitrary order We utilise tcpflow [39] toread PCAP files reconstruct TCP segments and obtain theapplication layer messages

We then represent the byte sequence as a collection ofn-grams taken with various stride values An n-gram is aconsecutive sequence of n items from a given sample in thiscase an n-gram is a consecutive series of bytes obtained fromthe application layer message Stride is how many steps thesliding window takes when collecting n-grams Figure 3shows examples of various n-grams obtained with a dif-ferent value of n and stride

We define the input to the classifier to be a set of integerencoded n-grams Let X x1 x2 x3 xk1113864 1113865 be the integerencoded n-grams collected from an application layer mes-sage as the input to the RNN model We denote k as thenumber of n-grams taken from each application layermessage Each n-gram is categorical data It means a value of1 is not necessarily smaller than a value of 50 ey aresimply different Encoding n-grams with one-hot encodingis not a viable solution as the resulting vector would besparse and hard to model erefore Blatta transforms the

sequence of n-grams with embedding technique Embeddingis essentially a lookup table that maps an item to a densevector with a fixed size embedded dim Using pretrainedembedding vectors eg GloVe [41] is common in naturallanguage processing problems but these pretrained em-bedding vectors were generated from a corpus of wordsWhile our approach works with byte-level n-gramserefore it is not possible to use the pretrained embeddingvectors Instead we initialise the embedding vectors withrandom values which will be updated by backpropagationduring the training so that n-grams which usually appeartogether will have vectors that are close to each other

It is worth noting that the number of n-grams collectedraises exponentially as the n increases If we considered allpossible n-gram values the model would overfit ereforewe limit the number of embedding vectors by building adictionary of most common n-grams in the training set Wedefine the dictionary size as l in which it contains l uniquen-grams and a placeholder for other n-grams that do notexist in the dictionary us the embedding vectors havel + 1 entries However we would like the size of each em-bedded vector to be less than l + 1 Let ε be the size of anembedded vector (embedded dim) If xt represents ann-gram the embedding layer transforms X to 1113954X We denote1113954X 1113954x1 1113954xk1113864 1113865 where each 1113954x is a vector with the size of εe embedded vectors 1113954X are then passed to the recurrentlayer

44 Training RNN-Based Classifier Since the input to theclassifier is sequential data we opted to use a method thattakes into account the sequential relationship of elements inthe input vectors Such methods capture the behaviour of asequence better than processing those elements individually[42] A recurrent neural network is an architecture of neuralnetworks in which each layer takes time series data pro-cesses them in several time steps and utilises the output ofthe previous time step in the current step calculation Werefer to these layers as recurrent layers Each recurrent layerconsists of recurrent units

e vanilla RNN has a vanishing gradient problem asituation where the recurrent model cannot be furthertrained because the value to update the weight is too smallthus there would be no point of training the model furthererefore long short-term memory (LSTM) [43] and gatedrecurrent unit (GRU) [44] are employed to avoid this sit-uation Both LSTM and GRU have cellsunits that areconnected recurrently to each other replacing the usualrecurrent units which existed in earlier RNNs What makestheir cells different is the existence of gates ese gatesdecide whether to pass certain information coming from theprevious cell (ie input gate and forget gate in LSTM unitand update gate in GRU) or going to the next unit (ieoutput gate in LSTM) Since LSTM has more gates thanGRU it requires more calculations thus computationallymore expensive Yet it is not conclusive whether one isbetter than the other [44] thus we use both types andcompare the results For brevity we will refer to both types asrecurrent layers and their cells as recurrent units

Security and Communication Networks 7

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 8: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

e recurrent layer takes a vector 1113954xt for each time step tIn each time step the recurrent unit outputs hidden state ht

with a dimension of |ht| e hidden state is then passed tothe next recurrent unit e last hidden state hk becomes theoutput of the recurrent layer which will be passed onto thenext layer

Once we obtain hk the output of the recurrent layer thenext step is to map the vector to benign or malicious classMapping hk to those classes requires us to use lineartransformation and softmax activation unit

A linear transformation transforms hk into a new vectorL using (1) whereW is the trained weight and b is the trainedbias After that we transform L to obtain Y yi | 0le ilt 21113864 1113865the log probabilities of the input file belonging to the classeswith LogSoftmax as described in (2) e output of Log-Softmax is the index of an element that has the largestprobability in Y All these forward steps are depicted inFigure 4

L Wlowast hk + b

L li1113868111386811138681113868 0le ilt 21113966 1113967

(1)

Y argmax0leilt2

logexp li( 1113857

11139362i0 exp li( 1113857

1113888 1113889 (2)

In the training stage after feeding a batch of trainingdata to the model and obtaining the output the next step is

to evaluate the accuracy of our model To measure ourmodelrsquos performance during the training stage we need tocalculate a loss value which represents how far our modelrsquosoutput is from the ground truth Since this approach is abinary classification problem we use negative log likelihood[45] as the loss functionen the losses are backpropagatedto update weights biases and the embedding vectors

45 Detecting Exploits e process of detecting exploits isessentially similar to the training process Application layermessages are extracted n-grams are acquired from thesemessages and encoded using the dictionary that was builtduring the training process e encoded n-grams are thenfed into the RNN model that will output probabilities ofthese messages being malicious When the probability of anapplication layer message being malicious is higher than 05the message is deemed malicious and an alert is raised

emain difference in this process to the training stage isthe time when Blatta stops processing inputs and makes thedecision Blatta takes the intermediate output of the RNNmodel hence requiring fewer inputs and disregarding theneeds to wait for the full-length message to process We willshow in our experiment that the decision taken by usingintermediate output and reading fewer bytes is close toreading the full-length message giving the proposed ap-proach an ability of early prediction

5 Experiments and Results

In this section we evaluate Blatta and present evidence of itseffectiveness in predicting exploit in network traffic Blatta isimplemented with Python 352 and PyTorch 020 libraryAll experiments were run on a PC with Core i7 340GHz16GB of RAM NVIDIA GeForce GT 730 NVIDIA CUDA90 and CUDNN 7

e best practice for evaluating a machine learningapproach is to have separate training and testing set As thename implies training set is used to train the model and the

Network packets

TCP reconstruction

Applicationlayer messages

(ie HTTPrequests SMTP

commands)

Most common n-grams inthe training set

n-gramsRNN-based model

Legitimate

Malicious

Figure 2 Architecture overview of the proposed method Application layer messages are extracted from captured traffic using tcpflow [39]n-grams are obtained from those messages ey will then be used to build a dictionary of most common n-grams and train the RNN-basedmodel (ie LSTM and GRU) e trained model outputs a prediction whether the traffic is malicious or benign

String HELLO_WORLD

HEL

HEL

HEL

ELL

LLO

LO_

LLO

O_W

WOR

LO_

WOR RLD

O_Wn-grams n = 3 stride = 1

n-grams n = 3 stride = 2

n-grams n = 3 stride = 3

hellip

Figure 3 An example of n-grams of bytes taken with various stridevalues

8 Security and Communication Networks

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 9: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

testing set is for evaluating the modelrsquos performance Wesplit BlattaSploit dataset in a 60 40 ratio for training andtesting set as malicious samples e division was carefullytaken to include diverse type of exploit payloads As samplesof benign traffic to train the model we obtained the samenumber of HTTP and FTP connections as the malicioussamples from UNSW-JAN Having a balanced set of bothclasses is important in a supervised learning

We measure our modelrsquos performance by using samplesof malicious and benign traffic Malicious samples are ob-tained from 40 of BlattaSploit dataset exploit and wormsamples in UNSW-FEB set (10855 samples) As for thebenign samples we took the same number (10855) of benignHTTP and FTP connections in UNSW-FEB We used theUNSW dataset to compare our proposed approach per-formance with previous works

In summary the details of training and testing sets usedin the experiments are shown in Table 3

We evaluated the classifier model by counting thenumber of true positive (TP) true negative (TN) falsepositive (FP) and false negative (FN) ey are then used tocalculate detection rate (DR) and false positive rate (FPR)which are metrics to measure our proposed systemrsquosperformance

DR measures the ability of the proposed system tocorrectly detect malicious traffic e value of DR should beas close as possible to 100 It would show how well our

system is able to detect exploit traffic e formula to cal-culate DR is shown in (3) We denote TP as the number ofdetected malicious connections and FN as the number ofundetected malicious connections in the testing set

DR TP

TP + FN (3)

FPR is the percentage of benign samples that are clas-sified as malicious We would like to have this metric to be aslow as possible High FPR means many false alarms areraised rendering the system to be less trustworthy anduseless We calculate this metric using (4) We denote FP asthe number of false positives detected and N as the numberof benign samples

FPR FPN

(4)

51 Data Analysis Before discussing about the results it ispreferable to analyse the data first to make sure that theresults are valid and the conclusion taken is on point Blattaaims to detect exploit traffic by reading the first few bytes ofthe application layer message erefore it is important toknow how many bytes are there in the application layermessages in our dataset Hence we can be sure that Blattareads a number of bytes fewer than the full length of theapplication layer message

Table 4 shows the average length of application layermessages in our testing set e benign samples have anaverage message length of 59325 lower than any other setserefore deciding after reading fewer bytes than at leastthat number implies our proposed method can predictmalicious traffic earlier thus providing improvement overprevious works

52 Exploring Parameters Blatta includes parameters thatmust be selected in advance Intuitively these parametersaffect the model performance so we analysed the effect onmodelrsquos performance and selected the best model to becompared later to previous works e parameters weanalysed are recurrent layer types n stride dictionary sizethe embedding vector dimension and the recurrent layertype When analysing each of these parameters we usedifferent values for the parameter and set the others to theirdefault values (ie n 5 stride 1 dictionary size 2000embe dd ing dim 32 recurrent layer LSTM and numberof hidden layers 1) ese default values were selectedbased on the preliminary experiment which had given thebest result Apart from the modifiable parameters we set theoptimiser to stochastic gradient descent with learning rate01 as using other optimisers did not necessarily increase ordecrease the performance in our preliminary experimente number of epochs is fixed to five as the loss value did notdecrease further adding more training iteration did not givesignificant improvement

As can be seen in Table 5 we experimented with variablelengths of input to see how much the difference when thenumber of bytes read is reduced It is noted that some

Application layer messages

Integer encodedn-grams as inputs

n-gram n-gram hellip

hellip

hellip

n-gram

Embeddinglayer

Embeddedvector

Embeddedvector

Embeddedvector

Recurrent layerwith LSTMGRU

unitsRecurrent

unitRecurrent

unitRecurrent

unit

Decision can be acquiredearlier without waiting for thelast n-grams to be processed

Somaxunit

Somaxunit

Training and detection phaseDetection phase only

Legitimate Malicious

Figure 4 A detailed view of the classifier n-grams are extractedfrom the input application layer message which are then used totrain an RNNmodel to classify whether the connection is maliciousor benign

Security and Communication Networks 9

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 10: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

messages are shorter than the limit In this case all bytes ofthe messages are indeed taken

In general reading the full length of application layermessages mostly gives more than 99 detection rate with251 false positive rate is performance stays still with aminor variation when the length of input messages is re-duced down to 500 bytes When the length is limited to 400the false positive rate spikes up for some configurations Ourhypothesis is that this is due to benign samples have rela-tively short length erefore we will pay more attention tothe results of reading 500 bytes or fewer and analyse eachparameter individually

n is the number of bytes (n-grams) taken in each timestep As shown in Table 5 we experimented with 1 3 5 7and 9-gram For brevity we omitted n 2 4 6 8 because theresult difference is not significant As predicted earlier 1-gram was least effective and the detection rates were around50 As for the high-order n-grams the detection rates arenot much different but the false positive rates are 5-gramand 7-gram provide better false positive rates (251) evenwhen Blatta reads the first 400 bytes 7-gram gives lower falsepositive rate (892) when reading first 300 bytes yet it is stilltoo high for real-life situation Having a higher n meansmore information is considered in a time step this may leadto not only a better performance but also overfitting

As the default value of n is five we experimented withstride of one to five us it can be observed how the modelwould react depending on how much the n-grams over-lapped It is apparent that nonoverlapping n-grams providelower false positives with around 1ndash3 decrease in detectionrates A value of two and three for the stride performs theworst and they missed quite a few malicious traffic

e dictionary size plays an important role in this ex-periment Having too many n-grams leads to overfitting asthe model would have to memorise too many of them thatmay barely occur We found that a dictionary size of 2000has the highest detection rate and lowest false positive rateReducing the size to 1000 has made the detection rate todrop for about 50 even when the model read the full-length messages

Without embedding layer Blatta would have used a one-hot vector as big as the dictionary size for the input in a timestep erefore the embedding dimension has the sameeffect as dictionary size Having it too big leads to overfittingand too little could mean toomuch information is lost due tothe compression Our experiments with a dimension of 1632 or 64 give similar detection rates and differ less than 2An embedding dimension of 64 can have the least falsepositive when reading 300 bytes

Changing recurrent layer does not seem to have muchdifference LSTM has a minor improvement over GRU Weargue that preferring one after the other would not make abig improvement other than training speed Adding morehidden layers does not improve the performance On theother hand it has a negative impact on the detection speedas shown in Table 6

After analysing this set of experiments we ran anotherexperiment with a configuration based on the best per-forming parameters previously explained e configu-ration is n 5 stride 5 dictionary size 2000embedding dimension 64 and a LSTM layer e modelthen has a detection rate of 9757 with 193 falsepositives by reading only the first 400 bytes is resultshows that Blatta maintains high accuracy while onlyreading 3521 the length of application layer messages inthe dataset is optimal set of parameters is then used infurther analysis

53 Detection Speed In the IDS area detection speed isanother metric worth looked into apart from accuracy-related metrics Time is of the essence in detection and theearlier we detect malicious traffic the faster we could reactHowever detection speed is affected bymany factors such asthe hardware or other applicationsservices running at thesame time as the experiment erefore in this section weanalyse the difference of execution time between reading thefull and partial payload

We first calculated the execution time of each record inthe testing set then divided the number of bytes processedby the execution time to obtain the detection speed in ki-lobytesseconds (KBps) Eventually the detection speed ofall records was averaged e result is shown in Table 6

As shown in Table 6 reducing the processed bytes to 700about half the size of an IP packet increased the detectionspeed by approximately two times (from an average of8366 kbps to 16486 kbps) Evidently the trend keeps risingas the number of bytes reduced If we take the number ofbytes limit from the previous section which is 400 bytesBlatta can provide about three times increment in detectionspeed or 2217 kbps on average We are aware that thisnumber seems small compared to the transmission speed ofa link in the network which can reach 1Gbps128MBpsHowever we argued that there are other factors which limitour proposed method from performing faster such as thehardware used in the experiment and the programminglanguage used to implement the approach Given the ap-proach runs in a better environment the detection speed willincrease as well

Table 3 Numbers of benign and malicious samples used in theexperiments

Set Obtained from Num of samples Class

Training set BlattaSploit 3406 MaliciousUNSW-JAN 3406 Benign

Testing setBlattaSploit 2276 MaliciousUNSW-FEB 10855 BenignUNSW-FEB 10855 Malicious

Table 4 Average message length of application layer messages inthe testing set

Set Average message lengthBlattaSploit 231893UNSW benign samples 59325UNSW exploit samples 143736UNSW worm samples 848

10 Security and Communication Networks

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 11: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

54 Comparison with Previous Works We compare Blattaresults with other related previous works PAYL [7]OCPAD [12] Decanter [15] and the autoencoder model[23] were chosen due to their code availability and both canbe tested against the UNSW-NB15 dataset PAYL andOCPAD read an IP packet at a time while Decanter and [23]reconstruct TCP segments and process the whole applica-tion layer message None of them provides early detectionbut to show that Blatta also offers improvements in detectingmalicious traffic we compare the detection and false positiverates of those works with Blatta

We evaluated all methods with exploit and worm data inUNWS-NB15 as those match our threat model and the

dataset is already publicly availableus the result would becomparable e results are shown in Table 7

In general Blatta has the highest detection ratemdashalbeit italso comes with the cost of increasing false positives Al-though the false positives might make this approach un-acceptable in real life Blatta is still a significant step towardsa direction of early prediction a problem that has not beenexplored by similar previous works is early predictionapproach enables system administrators to react faster thusreducing the harm to protected systems

In the previous experiments as shown in Section 52Blatta needs to read 400 bytes (on average 3521 of ap-plication layer message size) to achieve 9757 detection

Table 5 Experiment results of using various parameters combination and various lengths of input to the model

Parameter No of bytes No of bytesAll 700 600 500 400 300 200 All 700 600 500 400 300 200

n

1 4722 4869 4986 5065 5177 5499 6532 118 119 121 121 7131 787 89433 9987 9951 9977 991 9959 9893 9107 251 251 251 251 7261 1029 20515 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11087 9986 9947 9959 9937 9919 9853 9708 251 251 251 251 251 892 80929 9981 9959 9962 9957 9923 9816 8893 251 251 251 251 726 7416 906

Stride

1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 7339 7411 7401 7445 7469 7462 7782 181 181 181 181 7192 7246 19863 8251 8254 8307 8312 8325 835 8575 15 149 15 151 7162 7547 89634 996 9919 9926 9928 9861 9855 9837 193 193 193 193 193 7409 1055 9973 9895 9888 9865 98 9577 8829 193 192 193 193 193 5416 9002

Dictionary size

1000 4778 495 5036 5079 518 5483 5468 121 121 122 122 7133 7947 89422000 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11085000 9987 9937 9975 9979 9962 9969 9966 251 251 251 251 7261 1003 906110000 9986 9944 9974 9955 9944 9855 9833 251 251 251 251 7261 7906 901520000 9984 9981 9969 9924 9921 9943 9891 251 251 251 251 7261 8046 8964

Embedding dimension

16 9989 9965 997 9967 9922 9909 9881 251 251 251 251 251 7677 809432 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 110864 9987 992 9941 9909 9861 9676 8552 251 251 251 251 251 451 8985128 9984 9933 996 9935 9899 9769 8678 251 251 251 251 726 427 1088256 9988 9976 998 9922 9938 9864 9034 251 251 251 251 726 8079 906

Recurrent layer LSTM 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 1108GRU 9988 9935 9948 9935 9906 9794 8622 251 251 251 251 251 7895 848

No of layers1 9987 9955 9978 9957 9929 9891 8875 251 251 251 251 251 726 11082 9986 9946 9946 9938 992 9972 8865 251 251 251 251 7259 7878 20293 9984 9938 9968 991 9918 9816 8735 251 251 251 251 251 7494 1083

Detection rate False positive rateBold values show the parameter value for each set of experiment which gives the highest detection rate and lowest false positive rate

Table 6 e effect of reducing the number of bytes to the detection speed

No of bytesNo of LSTM layers

1 2 3All 8366plusmn 0238327 5514plusmn 0004801 3698plusmn 0011428700 16486plusmn 0022857 10704plusmn 0022001 735plusmn 0044694600 1816plusmn 0020556 1197plusmn 0024792 821plusmn 0049584500 20432plusmn 002352 1365plusmn 0036668 9376plusmn 0061855400 2217plusmn 0032205 1494plusmn 0037701 10302plusmn 0065417300 24076plusmn 0022857 16368plusmn 0036352 11318plusmn 0083477200 26272plusmn 0030616 18138plusmn 0020927 12688plusmn 0063024e values are average (mean) detection speed in kbps with 95 confidence interval calculated from multiple experiments e detection speed increasedsignificantly (about three times faster than reading the whole message) allowing early prediction of malicious traffic

Security and Communication Networks 11

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 12: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

rate It reads fewer bytes than PAYL OCPAD Decanter and[23] while keeping the accuracy high

55Visualisation To investigate how Blatta has performedthe detection we took samples of both benign andmalicious traffic and observed the input and output Wewere particularly interested in the relation of n-grams thatare not stored in the dictionary to the decision (unknownn-grams) ose n-grams either did not exist in thetraining set or were not common enough to be included inthe dictionary

On Figure 5 we visualise three samples of traffic takenfrom different sets BlattaSploit and UNSW-NB15 datasetse first part of each sample shows n-grams that did notexist in the dictionary e yellow highlighted parts showthose n-grams e second part shows the output of therecurrent layer for each time step e darker the redhighlight the closer the probability of the traffic beingmalicious to one in that time step

As shown in Figure 5 (detection samples) malicioussamples tend to have more unknown n-grams It is evidentthat the existence of these unknown n-grams increases theprobability of the traffic being malicious As an examplethe first five bytes of the five samples have around 05probability of being malicious And then the probabilitywent up closer to one when an unknown n-gram isdetected

Similar behaviour also exists in the benign sample eprobability is quite low because there are many knownn-grams Despite the existence of unknown n-grams in thebenign sample the end result shows that the traffic is benignFurthermore most of the time the probability of the trafficbeing malicious is also below 05

6 Evasion Techniques and Adversarial Attacks

Our proposed approach is not a silver bullet to tackle exploitattacks ere are evasion techniques which could beemployed by adversaries to evade the detection esetechniques open possibilities for future work erefore thissection talks about such evasion techniques and discuss whythey have not been covered by our current method

Since our proposed method works by analysing appli-cation layer messages it is safe to disregard evasion tech-niques on transport or network layer level eg IPfragmentation TCP delayed sending and TCP segmentfragmentation ey should be handled by the underlyingtool that reconstructs TCP session Furthermore those

evasion techniques can also be avoided by Snortpreprocessor

Two possible evasion techniques are compression andorencryption Both compression and encryption change thebytesrsquo value from the original and make the malicious codeharder to detect by Blatta or any previous work [7 12 23] onpayload-based detection Metasploit has a collection ofevasion techniques which include compression e com-pression evasion technique only works on HTTP and utilisesgzip is technique only compresses HTTP responses notHTTP requests While all HTTP-based exploits in UNSW-NB15 and BlattaSploit have their exploit code in the requestthus no data are available to analyse the performance if theadversary uses compression However gzip compressed datacould still be detected because they always start with themagic number 1f 8b and the decompression can be done in astreaming manner in which Blatta can do soere is also noneed to decompress the whole data since Blatta works wellwith partial input

Encryption is possibly the biggest obstacle in payload-based NIDS none of the previous works in our literature (seeTable 1) have addressed this challenge ere are otherstudies which deal with payload analysis in encrypted traffic[46 47] However these studies focus on classifying whichapplication generates the network traffic instead of detectingexploit attacks thus they are not directly relevant to ourresearch

On its own Blatta is not able to detect exploits hiding inencrypted traffic However Blattarsquos model can be exportedand incorporated with application layer firewalls such asShadowDaemon [48] ShadowDaemon is commonly in-stalled on a web server and intercepts HTTP requests beforebeing processed by a web server software It detects attacksbased on its signature database Since it is extensible andreads the same data as Blatta (ie application layer mes-sages) it is possible to use Blattarsquos RNN-based model toextend the capability of ShadowDaemon beyond rule-baseddetection More importantly this approach would enableBlatta to deal with encrypted traffic making it applicable inreal-life situations

Attackers could place the exploit code closer to the endof the application layer message Hoping that in doing so theattack would not be detected as Blatta reads merely the firstfew bytes However exploits with this kind of evasiontechnique would still be detected since this evasion tech-nique needs a padding to place the exploit code at the end ofthe message e padding itself will be detected as a sign ofmalicious attempts as it is most likely to be a byte sequencewhich rarely exist in the benign traffic

Table 7 Comparison to previous works using the UNSW-NB15 dataset as the testing set

MethodDetection rate ()

FPR ()Exploits in UNSW-NB15 Worms in UNSW-NB15

Blatta 9904 100 193PAYL 8712 2649 005OCPAD 1053 411 0Decanter 6793 9014 003Autoencoder 4751 8112 099

12 Security and Communication Networks

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 13: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

7 Conclusion and Future Work

is paper presents Blatta an early prediction system forexploit attacks which can detect exploit traffic by reading only400 bytes from the application layer message First Blattabuilds a dictionary containing most common n-grams in thetraining set en it is trained over benign and malicioussequences of n-grams to classify exploit traffic Lastly it con-tinuously reads a small chunk of an application layer messageand predicts whether the message will be a malicious one

Decreasing the number of bytes taken from applicationlayer messages only has a minor effect on Blattarsquos detectionrate erefore it does not need to read the whole appli-cation layer message like previously related works to detectexploit traffic creating a steep change in the ability of systemadministrators to detect attacks early and to block thembefore the exploit damages the system Extensive evaluationof the new exploit traffic dataset has clearly demonstrated theeffectiveness of Blatta

For future study we would like to train a model thatcan recognise malicious behaviour based on messagesexchanged between clients and a server since in this paperwe only consider messages from clients to a server but notthe other way around Detecting attacks on encryptedtraffic while retaining the early prediction capability couldbe a future research direction It also remains a questionwhether the approach in mobile traffic classification[46 47] would be applicable to the field of exploitdetection

Data Availability

e compressed file containing the dataset is available onhttpsbitlyblattasploit

Conflicts of Interest

e authors declare that they have no conflicts of interest

Figure 5 Visualisation of unknown n-grams in the application layer messages and outputs of the recurrent layer for each time step It showshow the proposed system observes and analyses the traffic Yellow blocks show unknown n-grams Red blocks show the probability of thetraffic being malicious when reading an n-gram at that point

Security and Communication Networks 13

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 14: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

Acknowledgments

is work was supported by the Indonesia Endowment Fundfor Education (Lembaga Pengelola Dana PendidikanLPDP)under the grant number PRJ-968LPDP32016 of the LPDP

References

[1] McAfee ldquoMcAfee labs threat reportmdashaugust 2019rdquo ReportMcAfee Santa Clara CA USA 2019

[2] E DB ldquoOffensive securityrsquos exploit database archiverdquo EDBBedford MA USA 2010

[3] T-H Cheng Y-D Lin Y-C Lai and P-C Lin ldquoEvasiontechniques sneaking through your intrusion detectionpre-vention systemsrdquo IEEE Communications Surveys amp Tutorialsvol 14 no 4 pp 1011ndash1020 2012

[4] M Roesch ldquoSnort lightweight intrusion detection for net-worksrdquo in Proceedings of LISArsquo99 13th Systems Administra-tion Conference vol 99 pp 229ndash238 Seattle WashingtonUSA November 1999

[5] R Sommer and V Paxson ldquoOutside the closed world onusing machine learning for network intrusion detectionrdquo inProceedings of the 2010 IEEE Symposium on Security AndPrivacy pp 305ndash316 BerkeleyOakland CA USA May 2010

[6] J J Davis and A J Clark ldquoData preprocessing for anomalybased network intrusion detection a reviewrdquo Computers ampSecurity vol 30 no 6-7 pp 353ndash375 2011

[7] K Wang and S J Stolfo ldquoAnomalous payload-based networkintrusion detectionrdquo Lecture Notes in Computer ScienceRecent Advances in Intrusion Detection in Proceedings of the7th International Symposium Raid 2004 pp 203ndash222Gothenburg Sweden September 2004

[8] A Jamdagni Z Tan X He P Nanda and R P Liu ldquoRePIDSa multi tier real-time payload-based intrusion detectionsystemrdquo Computer Networks vol 57 no 3 pp 811ndash824 2013

[9] R Perdisci D Ariu P Fogla G Giacinto and W LeeldquoMcPAD a multiple classifier system for accurate payload-based anomaly detectionrdquo Computer Networks vol 53 no 6pp 864ndash881 2009

[10] D Ariu R Tronci and G Giacinto ldquoHMMPayl an intrusiondetection system based on hidden markov modelsrdquo Com-puters amp Security vol 30 no 4 pp 221ndash241 2011

[11] A Oza K Ross R M Low and M Stamp ldquoHTTP attackdetection using n-gram analysisrdquo Computers amp Securityvol 45 pp 242ndash254 2014

[12] M Swarnkar and N Hubballi ldquoOCPAD one class naive bayesclassifier for payload based anomaly detectionrdquo Expert Sys-tems with Applications vol 64 pp 330ndash339 2016

[13] K Bartos M Sofka and V Franc ldquoOptimized invariantrepresentation of network traffic for detecting unseen mal-ware variantsrdquo in Proceedings of the 25th USENIX SecuritySymposium pp 807ndash822 Austin TX USA August 2016

[14] H Zhang D Yao N Ramakrishnan and Z Zhang ldquoCausalityreasoning about network events for detecting stealthy mal-ware activitiesrdquo Computers amp Security vol 58 pp 180ndash1982016

[15] R Bortolameotti T van Ede M Caselli et al ldquoDECANTeRDEteCtion of anomalous outbound HTTP traffic by passiveapplication fingerprintingrdquo in Proceedings of the 33rd AnnualComputer Security Applications Conference pp 373ndash386Orlando FL USA December 2017

[16] D Golait and N Hubballi ldquoDetecting anomalous behavior invoip systems a discrete event system modelingrdquo IEEE

Transactions on Information Forensics and Security vol 12no 3 pp 730ndash745 2016

[17] P Duessel C Gehl U Flegel S Dietrich and M MeierldquoDetecting zero-day attacks using context-aware anomalydetection at the application-layerrdquo International Journal ofInformation Security vol 16 no 5 pp 475ndash490 2017

[18] E Min J Long Q Liu J Cui and W Chen ldquoTR-IDSanomaly-based intrusion detection through text-convolu-tional neural network and random forestrdquo Security andCommunication Networks vol 2018 Article ID 49435099 pages 2018

[19] X Jin B Cui D Li Z Cheng and C Yin ldquoAn improvedpayload-based anomaly detector for web applicationsrdquoJournal of Network and Computer Applications vol 106pp 111ndash116 2018

[20] Y Hao Y Sheng and J Wang ldquoVariant gated recurrent unitswith encoders to preprocess packets for payload-aware in-trusion detectionrdquo IEEE Access vol 7 pp 49985ndash49998 2019

[21] P Schneider and K Bottinger ldquoHigh-performance unsu-pervised anomaly detection for cyber-physical system net-worksrdquo in Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCymdashCPS-SPCrsquo18 pp 1ndash12 Barcelona Spain January 2018

[22] T Hamed R Dara and S C Kremer ldquoNetwork intrusiondetection system based on recursive feature addition and bigramtechniquerdquo Computers amp Security vol 73 pp 137ndash155 2018

[23] B A Pratomo P Burnap and G eodorakopoulos ldquoUn-supervised approach for detecting low rate attacks on networktraffic with autoencoderrdquo in 2018 International Conference onCyber Security and Protection of Digital Services (Cyber Se-curity) pp 1ndash8 Glasgow UK June 2018

[24] Z-Q Qin X-K Ma and Y-J Wang ldquoAttentional payloadanomaly detector for web applicationsrdquo in Proceedings of theInternational Conference on Neural Information Processingpp 588ndash599 Siem Reap Cambodia December 2018

[25] H Liu B Lang M Liu and H Yan ldquoCNN and RNN basedpayload classification methods for attack detectionrdquo Knowl-edge-Based Systems vol 163 pp 332ndash341 2019

[26] Z Zhao and G-J Ahn ldquoUsing instruction sequence ab-straction for shellcode detection and attributionrdquo in Pro-ceedings of the Conference on Communications and NetworkSecurity (CNS) pp 323ndash331 National Harbor MD USAOctober 2013

[27] A Shabtai R Moskovitch C Feher S Dolev and Y ElovicildquoDetecting unknownmalicious code by applying classificationtechniques on opcode patternsrdquo Security Informatics vol 1no 1 p 1 2012

[28] XWang C-C Pan P Liu and S Zhu ldquoSigFree a signature-freebuffer overflow attack blockerrdquo IEEE Transactions on Depend-able and Secure Computing vol 7 no 1 pp 65ndash79 2010

[29] K Wang J J Parekh and S J Stolfo ldquoAnagram a contentanomaly detector resistant to mimicry attackrdquo Lecture Notesin Computer Science in Proceedings of the InternationalWorkshop on Recent Advances in Intrusion Detectionpp 226ndash248 Hamburg Germany September 2006

[30] R 7 ldquoMetasploit penetration testing frameworkrdquo 2003httpswwwmetasploitcom

[31] N Moustafa and J Slay ldquoUNSW-NB15 a comprehensive dataset for network intrusion detection systems (UNSW-NB15network data set)rdquo in Proceedings of the Military Commu-nications and Information Systems Conference (MiLCIS)pp 1ndash6 Canberra ACT Australia November 2015

14 Security and Communication Networks

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15

Page 15: BLATTA:EarlyExploitDetectiononNetworkTrafficwith ...downloads.hindawi.com/journals/scn/2020/8826038.pdfAs the length varies, longer messages will take more time to analyse, during

[32] R Lippmann J W Haines D J Fried J Korba and K Dasldquoe 1999 DARPA off-line intrusion detection evaluationrdquoComputer Networks vol 34 no 4 pp 579ndash595 2000

[33] S T Brugger and J Chow ldquoAn assessment of the DARPA IDSevaluation dataset using snortrdquo vol 1 no 2007 Tech ReportCSE-2007-1 p 22 UCDAVIS Department of ComputerScience Davis CA USA 2007

[34] R Pang M Allman M Bennett J Lee V Paxson andB Tierney ldquoA first look at modern enterprise trafficrdquo inProceedings of the 5th ACM SIGCOMM Conference on In-ternet Measurement p 2 Berkeley CA USA October2005

[35] A Shiravi H Shiravi M Tavallaee and A A GhorbanildquoToward developing a systematic approach to generatebenchmark datasets for intrusion detectionrdquo Computers ampSecurity vol 31 no 3 pp 357ndash374 2012

[36] I Bro 2008 httpwwwbro-idsorg[37] Argus 2008 httpsqosientcomargus[38] R 7 ldquoMetasploitablerdquo 2012 httpsinformationrapid7com

download-metasploitable-2017html[39] S L Garfinkel andM Shick ldquoPassive TCP reconstruction and

forensic analysis with tcpflowrdquo Technical Reports NavalPostgraduate School Monterey CA USA 2013

[40] K Rieck and P Laskov ldquoLanguage models for detection ofunknown attacks in network trafficrdquo Journal in ComputerVirology vol 2 no 4 pp 243ndash256 2007

[41] J Pennington R Socher and C D Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP) pp 1532ndash1543 Doha Qatar October 2014

[42] I Goodfellow Y Bengio and A Courville Deep LearningMIT press Cambridge MA USA 2016

[43] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[44] J Chung C Gulcehre K Cho and Y Bengio ldquoEmpiricalevaluation of gated recurrent neural networks on sequencemodelingrdquo 2014 httpsarxivorgabs14123555

[45] PyTorch ldquoNegative log likelihoodrdquo 2016[46] M Lotfollahi M J Siavoshani R S H Zade andM Saberian

ldquoDeep packet a novel approach for encrypted traffic classi-fication using deep learningrdquo Soft Computing vol 24 no 3pp 1999ndash2012 2020

[47] G Aceto D Ciuonzo A Montieri and A Pescape ldquoMI-METIC mobile encrypted traffic classification using multi-modal deep learningrdquo Computer Networks vol 165 ArticleID 106944 2019

[48] H Buchwald ldquoShadow daemon a collection of tools to detectrecord and block attacks on web applicationsrdquo ShadowDaemon Introduction 2015 httpsshadowdzecureorgoverviewintroduction

Security and Communication Networks 15