predictive network anomaly detection and visualization

288 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 2, JUNE 2010

Predictive Network Anomaly Detection andVisualization

Mehmet Celenk, Member, IEEE, Thomas Conley, John Willis, Student Member, IEEE, andJames Graham, Student Member, IEEE

Abstract—Various approaches have been developed for quanti-fying and displaying network traffic information for determiningnetwork status and in detecting anomalies. Although many ofthese methods are effective, they rely on the collection of long-termnetwork statistics. Here, we present an approach that usesshort-term observations of network features and their respectivetime averaged entropies. Acute changes are localized in networkfeature space using adaptive Wiener filtering and auto-regres-sive moving average modeling. The color-enhanced datagramis designed to allow a network engineer to quickly capture andvisually comprehend at a glance the statistical characteristics ofa network anomaly. First, average entropy for each feature iscalculated for every second of observation. Then, the resultantshort-term measurement is subjected to first- and second-ordertime averaging statistics. These measurements are the basis of anovel approach to anomaly estimation based on the well-knownFisher linear discriminant (FLD). Average port, high port, serverports, and peered ports are some of the network features usedfor stochastic clustering and filtering. We empirically determinethat these network features obey Gaussian-like distributions. Theproposed algorithm is tested on real-time network traffic datafrom Ohio University’s main Internet connection. Experimenta-tion has shown that the presented FLD-based scheme is accuratein identifying anomalies in network feature space, in localizinganomalies in network traffic flow, and in helping network engi-neers to prevent potential hazards. Furthermore, its performanceis highly effective in providing a colorized visualization chart tonetwork analysts in the presence of bursty network traffic.

Index Terms—Auto-regressive moving average (ARMA) mod-eling, entropy, Fisher discriminant, network anomaly, Wiener fil-tering.

I. INTRODUCTION

F IREWALLS and intrusion detection devices are the pri-mary way of protecting today’s modern enterprise net-

works from a host of network anomalies such as viruses, worms,scanners, and denial of service (DoS) from botnets. Their de-fense mechanism relies on detection of attacks after they havebegun affecting the targeted network. Existing methods are ableto identify specific packets which match a known pattern or orig-inate from a specified location. However, these signature-based

Manuscript received November 19, 2009; revised October 25, 2009; acceptedDecember 15, 2009. Date of publication February 17, 2010; date of current ver-sion May 14, 2010. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Christian Cachin.

The authors are with School of Electrical Engineering and ComputerScience, Stocker Center, Ohio University, Athens, OH 45701 USA(e-mail: [email protected]; [email protected]; [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2010.2041808

systems fail to detect unknown anomalies. An anomaly mightbe an old attack that has changed in some way to avoid detec-tion, or it could be a completely new form of attack. To alle-viate these shortcomings, significant research has been devotedto the task of identifying network anomalies using methods fromstatistical signal analysis and pattern recognition theory. Kwittand Hoffman [1] and Shen et al. [2] dealt with anomaly de-tection using a robust principal component analysis (PCA) andmetrics of aggregated network behavior. Additionally, the ap-proach undertaken by Karasaridis et al. [3] deals primarily withthe detection of botnets. The work described in [4] considersthe detection of network intrusions in covariance space usingpattern-recognition methods. Sang and Li [5] describe how farinto the future one can predict network traffic by employingauto-regressive moving average (ARMA) as a model. Similarly,Cho et al. [6] present a method in which near-real-time networktraffic data can be measured and filtered utilizing the Patriciatree and least recently used (LRU) replacement policy. In thework of Pang et al. [7], they examine known anomalies and pos-sible ways for detecting them by filtering data to reduce loadon the system. Feldman et al. [8] proposed a cascade-based ap-proach that dealt with multifractal behavior in network traffic.The paper also describes a way of detecting network problemsusing their system. Yurcik and Li [9] and Plonka [10] demon-strate how NetFlows and FlowScans can be a more efficient wayto monitor network traffic. Gong [11], [12] suggests methods inwhich the NetFlows can be used to detect worms and other typesof intrusion into a network. Wavelet-based approaches have alsobeen proposed for detecting anomalies and predicting ethernettraffic (see, for example, [13]).

Entropy is another well-known measure for quantifying theinformation of network traffic and has been extensively studiedfor anomaly detection and prevention [14], [15]. Significant re-search has also been devoted to the task of studying traffic struc-ture and flows in conjunction with visual correlation of networkalerts [16]. Most of the approaches are devised based on thelong-term statistics of network traffic entropy [17]–[20]. Oneexample is the work by Eimann, et al. [17] which discuses anentropy-based approach to detect network events. Harrington’swork [19] is similar, but uses cross entropy and second-orderdistribution to detect changes in network behavior. Lall, et al.[18] employ the entropy of traffic distributions to aid in networkmonitoring, while Gu, et al. [14] utilize an entropy measure todetect anomalies in network traffic. More recently, Gianvecchioand Wang [20] introduced an entropy-based approach to de-tect the exploitation of covert timing channels in network trafficamong the large amount of regular traffic. In Kim, et al. [21],the data in packet headers are examined using aggregate anal-

1556-6013/$26.00 © 2010 IEEE

CELENK et al.: PREDICTIVE NETWORK ANOMALY DETECTION AND VISUALIZATION 289

ysis of correlation data and discrete wavelet transforms. Statis-tical data analysis of pattern recognition theory is also appliedto the same problem with varying degrees of success [22]. Asupervised statistical pattern recognition technique is proposedby Fu, et al. [23], which requires the complete statistics of net-work load and attack. Wagner and Plattner [24] have discusseda method based on changes in entropy content for IP addressesand ports but have not attempted to distinguish normal trafficfrom abnormal.

Other researchers have taken a variety of approaches.Thottan and Ji [25] apply signal processing techniques to theproblem of network anomaly detection using statistical dataanalysis. The IP network anomaly detection is defined in asingle class domain in conjunction with the types and sourcesof data available for analysis. They present a method basedon sudden change detection on signals from multiple metrics,each of which has different statistical properties. In Hajji’swork [26], the approach undertaken addresses the problem ofchange in characteristics of network traffic, and its relationwith anomalies in local area networks. The emphasis is onfast detection for reducing potential impact of problems onnetwork services’ users by finite Gaussian mixture traffic modeland a baseline of network normal operation as the asymptoticdistribution of the difference between successive estimates ofmultivariate Gaussian model parameters with mean zero undernormal operations, and sudden jumps in this mean in abnormalconditions. In [27], the researchers introduced a supervisedanomaly detection method by concatenating the -Meansclustering and the ID3 decision tree learning. In their work,

-Means clustering is carried out first on training instances todetermine number of distinct clusters, representing regions ofsimilar instances. An ID3 decision tree is then trained with theinstances in each -Means cluster so that the anomaly detectioncan be performed via a score matrix. Authors of [28] make useof the Tsallis (or nonextensive) entropy to deal with networktraffic anomalies. They have demonstrated the effectivenessof this measure over the traditional Shannon entropy-basedtechniques by detecting more network anomalies and reducingfalse negatives. In turn, a flexibility improvement in the detec-tion process is reported due to the finely tuned sensitivity ofthe anomaly detection system as opposed to the conventionalentropy measure. Kim and Reddy [29] consider the time seriesanalysis of different packet header data and propose simple andefficient mechanisms for collecting and analyzing aggregateddata in real-time. They demonstrate that their proposed signalseries have higher efficacy in detecting attacks than the analysisof traffic volume itself. Recently, Androulidakis, et al. [30] haveproposed a method of anomaly detection and classification viaopportunistic sampling which also makes use of port entropy.

None of the studies described above provides a method ofpredicting an attack before it occurs. In this work, we aim topredict network anomalies. We define an anomaly as any de-tected network behavior that is of interest to a network or se-curity engineer such as worm outbreaks, botnet command andcontrol traffic, misconfigured network devices, or DoS attacks.To this end, we statistically analyze network flow data and applyWeiner filtering to pass normal traffic. This, in turn, helps usidentify the signal corresponding to network anomalies in the

Fig. 1. Modeling of cyber attack progression in a network.

selected feature measurement, which characterizes the networkflow in that dimension. By estimating the auto-correlation func-tion of normal traffic, the ARMA predictor [31] is devised usingthe well-known Yule–Walker regression [32].

In the following sections, we describe the approach and theresults achieved in our attempt to predict network traffic anoma-lies. Section II explores the structure of the proposed algorithm.The experimental test-bed and computer results are described indetail in Section III. Remaining discussion is devoted to conclu-sions, future work, and potential applications.

II. OVERALL APPROACH

In this work, we characterize a network by considering itsfeatures (e.g., average port, traffic load, etc.), each of which isa continuous vector of dimension ; i.e.,

while a cyber attack is formulated by a continuousrandom function whose behavior on the network can bemodeled as in Fig. 1. More rigorously, the attack functionmay be written as a linear combination of two stochastic com-ponents; namely, initiation and exploit. Thus,

(1)

Here, is the initiation phase which proceeds the actualnetwork attack and is identified by its anomalous traffic. Forexample, in Fig. 1 the initiation phase of a network worm ischaracterized by infectious traffic, a BotNet is initiated by com-mand and control signals, and a network flaw (e.g., unpatchedoperating systems, open ports, infected servers, etc.) is firstexploited using reconnaissance traffic. Exploit phases of thesesample anomalies are also depicted in Fig. 1 and include datadestruction by network worms, DoS by BotNets, or a systemcompromise. Notice that the anomaly progression time periods

) and ( are not identical in length or behavior for thevarious attack types.

In this particular research, we have determined the initiationtime empirically by experimentally changing the windowsize starting from 3 s and gradually increasing by 1 s until ananomaly presents itself visually. Note that the length of timefor the exploit depends on such varying factors as net-work configuration, computer defenses, and individual exploitcharacteristics. Similar studies have also been done which re-port measured values of the aforementioned timing parametersfor different attack types and systems. For example, in [33]the author has elaborated on how to determine the timeframesof various phases of infection and exploitation using a proba-bilistic model for estimation of the times. Schneider [34] studied


the timing of propagation for various types of network wormsand determined theoretical initiation times between 1.42 and1.85 s for various kinds of worms. In [35], it is reported thatthe slammer worm was able to complete infection and replicatein 8.5 s. Note that the infection of a single machine may occurwithout generating any network traffic. Such events, consideredinstantaneous from a network perspective, may be part of the ini-tial phase of a widespread network anomaly which is detectedas the compromised machine begins to use network resourcesin an abnormal way.

Our research deals with varying time scales and with the peri-odic nature of events by changing the sampling rate and windowsize in the underlying statistical model. Other researchers havetaken a similar timing approach to the analysis of the worm out-breaks. In particular, Rajab, et al. [36] attempts to determine theprobability of detection based on the underlying statistics suchas sampling rate and window size.

In our case, shorter initialization phases as well as periodicpatterns in the network traffic are handled by a smaller windowsize, which is made feasible by increasing the sampling rate.A faster sampling rate allows the smaller window size withoutdegrading the statistics, since the sample size of a window willremain high. This sampling scheme works for any single eventas long as the window size is smaller than the initiation phase.In addition, any periodic event is detected if the window size issmaller than the period of the event in progress.

For example, a daily process will not be considered anoma-lous in statistics calculated for a window size of one week, sincethe multiple daily occurrences will be normal for the weeklystatistics. However, the initiation of such an event will be de-tected as an anomaly in a window size of one hour becausethat window’s statistics will vary from normal for that windowsize. Thus, observations from a network under attack will com-prise not only normal network traffic but also anomaloustraffic features , which is represented as . Thismodeling also explains the rationale behind predicting anoma-lies and is driven by the premise that if we sample the networktraffic fast enough, the initiation phase would be sufficient topredict the onset of the actual attack (i.e., exploit). Referring tothe model in Fig. 1, we can write

(2)

Here, denotes the discrete time variable index and is aninfinitesimal sampling interval. Since appears in all the terms,it is customarily ignored in discrete time signal representation.As a result, the discrete network feature vector can be expressedas

(3)

where the symbol denotes matrix transposition. Fig. 2 showsa sample screenshot of the real-tme appearance of as dis-played on a monitoring console currently used by network andsecurity engineers. This snapshot of raw network traffic data issampled under load from Ohio University network and showssource and destination IP, port, netflow statistics, and calculatedsuspicious activity (“EVIL”) score. Since this display containsa significant amount of information, it is difficult for a networkengineer to capture the overall picture of activity for even a shorttime period. In order to represent such large amounts of infor-mation in a compact form without loosing generality, we map itscontent into the pattern vector as defined in (3). As an example,a particular entry from Fig. 2 is shown below as the th featurerepresenting destination port

(4)

In this case, the corresponding th feature pattern vectorbecomes

. Notice thatsince the total number of ports is 65 535, the maximum dimen-sion for the feature vector is 65 535, respectively.Here, each port utilization is a random variable varying by time.The total number ( ) of distinct feature classes depends on thenetwork attributes being analyzed. In general, if we consider allthe possible network feature vectors and the discreet time vari-able, the stochastic process describing network behavior formsa hyper-space of dimension varying by time . Hence, ourtask is to find a suboptimal space that gives rise to an anomalyand to come up with an intelligible visual representation of thiscomplex stochastic feature space and associated data flow. Thebasic information measure selected as the discriminatory factorfor this purpose is the entropy of a stochastic process. Ingeneral, for a continuous stochastic process the entropy

is defined by [37]

(5)

For a discrete-space and discrete-time stochastic process ,entropy is computed as

(6)

For every second, the number of observations of a particularfeature is counted and linearly quantized. For discrete features,such as port id, no quantization is needed. From these data, wegenerate a feature space to be used in average entropy calcula-tions over time for each , as

(7)

where is the time averaging window size, is the timeaveraged entropy, is the time average of the mean en-


Fig. 2. Snapshot of textual display of raw traffic data from Ohio University network showing source and destination IP, port, netflow statistics, and calculatedsuspicious activity (called “EVIL”) score.

tropy, and is the variance average. They are calculatedas

(8)

(9)

(10)

where , , and are the entropy, mean, andvariance of the feature in time duration from to (see Fig. 3).The modified Fisher linear discriminant (FLD) function [31] isselected as a scatter measure for each network feature patternclass. For the purpose of discriminating anomaly patterns fromregular traffic, we define the modified FLD performance index

as a function of the window size and the discrete-time variable . Hence, we have [38]

(11)

Notice that the proper selection of window size enables theprediction scheme to identify slow initiation attacks as well asrapidly progressing exploits. Using a larger will be moreeffective in detecting changes in measured features that occurslowly as in attacks that are commonly referred to as “slow andlow,” whereas a smaller helps a network security analyst infinding explosive attacks such as DoS.

Fig. 3. Network entropy datagram in which the entropy �� is denoted byimpulses (right side) while the FLD measure �� is represented by rectan-gular shapes (left side) and the mean and variance of entropy are indicated withdotted lines.

Fig. 3 is indeed a new graphical representation of andthe associated network attributes. This visualization techniqueis similar to the datagram concept of network communicationas described in [39] and employed here to be an effective means


of one-dimensional traffic tracking and alerting. Section IV willexplore the aforementioned visualization concept to illustratethe complex text-based information graph currently used by net-work engineers as shown in Fig. 2. The above performanceindex assumes a large value when the entropy has high meanand low variance in varying-size time-window due to unusualactivity in the network. In experimentation, window size isdynamically adjusted in such a way that the range in which themaximum value of is reached. Hence, the maximum value of

indicates the occurrence of a possible anomaly. This corre-sponds to determining an -dimensional polyhedra of varyingsizes that show which network attributes are more likely cur-rently involved in a cyber-attack in the given -dimen-sional time-varying network feature space. For simplicity, wefirst consider a single feature which describes the network be-havior. In this case, solution to the anomaly detection problembecomes finding the solution to the second-order derivative of

for which the fourth-order derivative is negative, thus,indicating a maximum value of . Mathematically, we have

(12)

The roots, and , which make the fourth-order derivative of with respect to and negative, are thedesired time and duration parameters for detecting anomalies.Explicitly, this is defined by

(13)

Solutions of (12) and (13) must be sought continuously sincethe performance index is a function of the time variableand window size . To alleviate the high computational costassociated with the process of localizing and , we presentan optimization procedure by rewriting (12) as

(14)

We then search for a value of that maximizes ; i.e.,

(15)

which is subject to

(16)

In the computer implementation, the performance index func-tion is first smoothed along the discrete time axis by usingan adaptive Gaussian filter. Then, (15) and (16) are approx-imated by the first- and second-order derivative templates of

and , respectively. For optimal windowsize determination, which follows the function behavior closely,we present a prediction method based on Wiener filtering andARMA modeling in Section III.

III. OPTIMAL WINDOW SIZE DETERMINATION

A. Adaptive Digital Anomaly Predictor (ADAP)

First, we consider a single component of the -dimen-sional traffic feature set as defined above. This measured input is

Fig. 4. Block diagram of ADAP.

modeled similar to (2) as a linear combination of normal trafficsignal and anomalous traffic noise expressed mathe-matically by

(17)

As was done earlier with continuous processes in Section II,we discretize (17) by letting where is the samplinginterval which can be made as small as possible. Hence,

(18)

Since is common for all the terms, it can be ignored withoutloosing generality. Thus, we have

(19)

The approach undertaken herein is to extract the normal net-work flow and use it to predict anomalies as shown inFig. 4. The Wiener filter removes the noise from the signal andoutputs the estimate of the normal traffic flow. To achieve a bal-anced estimate, we adjust the window size and associated coef-ficients in the Wiener filter and the ARMA predictor therebytracking signal variation more closely. The feedback controlchannel shown by a dashed line in Fig. 4 lets the algorithm adaptto a changing network signal waveform. This is made in accor-dance with the adaptable Wiener filter implementation methodproposed in [40] as

(20)

where is the Wiener filter output in discrete time domain,is the mean value of the normal flow, and , are

the respective variances of the measured traffic and anomalycomputed in a window of size . These signal signatures arecalculated as

(21)

(22)

(23)

The estimated signal is then applied to the ARMA unit,which predicts the next value of . This is sim-ilar to what has been done in [5] with the exception that theypredict network traffic but ignore any anomaly that may existat the time of measurement. Their prediction is based on the


assumptions of stationary Gaussian white noise with unit vari-ance and the Gaussian nature of the network traffic without anyempirical justification. On the other hand, in this research wehave no restriction on network flow, nor do we have restrictionon noise. Noise is considered to be the combination of networkanomalies and white noise as described in [5]. ARMA starts theprocess of estimation by calculating the auto-correlation func-tion for as in (27). The auto-correlation function is thenused in the third-order predictor as in the following:

(24)

(25)

Here, represents the predictor coefficient and denotesthe auto-correlation function value at . Let be the powerspectrum (i.e., the Fourier transform) of the auto-correlationfunction of the network anomalous signal and bethe power spectrum (i.e., the Fourier transform) of the auto-cor-relation of the normal network flow . The measurement ofthese stochastic signal signatures is carried out using

(26)

(27)

(28)

(29)

where (26) and (27) denote the estimated auto-correlation func-tions of the noise signal and the normal traffic signal in a windowof size while (28) and (29) represent the power spectrum ofthe noise signal and normal traffic signal, respectively. Since di-rect measurement of the power spectrum is complex and costly,it is desirable to predict it using the periodogram approach de-scribed in [40] and summarized by

(30)

(31)

where (30) and (31) correspond to the estimated power spec-trums using periodograms of the noise and the normaltraffic signal , and is the number of measurements. No-tice that the auto-correlation function is even; i.e.,

, and has its maximum at the origin, that is,for all values of . Using (24) and (25), the value of

is predicted by the ARMA module and compared to

measured signal, producing the output , of the ADAPfunction. Hence, we have

(32)

(33)

(34)

(35)

where represents the prediction or estimation error associatedwith the ARMA unit. Equations (32) through (35) enable us tocome up with a decision predicate expressed as

1) if , then ; hence, noanomaly;

2) if , then ; hence,anomaly predicted;

3) if , then ; hence,anomaly plus error predicted;

for identifying any anomaly. This structured logic extends thework of [5] by directly predicting near-real-time network at-tacks.

B. ADAP Using Multiple Features

In this section, we extend the single ADAP functionality, toincorporate information from multiple features into the detec-tion process. It is important to consider a diverse feature set inorder to cover a wide range of complex anomaly signatures. Ma-licious activities are specifically designed to spread out their ef-fect over multiple features in order to avoid detection by nar-rowly focused detectors. A cyber-attack, for instance, may tryto reduce its visibility by using a common, well-known port andcause only a slight anomalous increase and no alarm. However,even the slightest increase in a single feature, when combinedwith anomalous activity in other features, can have a syner-gistic effect which makes it possible to predict that anomaly.By using a multiple feature space, the extended method beingdescribed is able to tap the additive power of the rich networktraffic feature set. As a way of dealing with complex anom-alies without loosing generality, we use the feature set

, where is “average port,” is “high ports,”is “server ports,” is “peered ports,” as a representative ex-

periment. Since these individual characteristics may be corre-lated, this feature space can lead to a wrong conclusion, unlessthe underlying features are first uncorrelated and the resultantfeature space has a Euclidean metric. This is achieved using thewell-known Karhunen–Love (PCA or discrete Hotelling) trans-formation [31]. Let be the cross-correlation matrix forgiven by

(36)where the diagonal elements are the auto-correlation functionsof the features and off-diagonals are cross-correlations of re-spective feature pairs. Notice that is a symmetric squarematrix leading to a fourth-order characteristic equation and re-sults in four corresponding eigenvectors. By normalizing theseeigenvectors, we generate a four-dimensional space in which


Fig. 5. Detecting network anomalies with ADAP and weighted majorityvoting.

the selected features are uncorrelated. Hence, the ADAP func-tion can filter features individually by adaptively computing themean and variances as the predictor output starts to deviate sig-nificantly from actual measured values. Here, we refer to theuncorrelated ’s by ’s, and the respective probability densityfunction of the uncorrelated version of , represented by

, via

(37)

By empirical data analysis, we have verified that each feature inis normally distributed. Hence, we write

(38)

where mean and are the mean and variance of . Forthe ease of implementation, we only consider auto-correla-tion for the diagonal elements of the correlation matrix of(36). As a result, the anomaly prediction decision is madein the space using a weightedmajority-voting scheme as shown in Fig. 5. The ADAP func-tion processes each single feature in parallel and provides aprediction output. We present the overall result asa linear combination of the individual channels with weightscorresponding to the maximum value of each feature’s auto-cor-relation function. Thus,

(39)

If it exceeds a predetermined empirical anomaly threshold,then the activity represented by the feature set is deemedan anomaly. The system can then be directed to respond at thetime instant of . The anomaly detector’s effectiveness isdriven by a set of parameters chosen in accordance with theirsignificance. By fine-tuning the set of features, the Wiener filterand auto-correlation window sizes and the majority votingthresholds, we find an optimal feature space in which character-istics are uncorrelated and yet still contain all the informationassociated with network traffic and attacks.

C. Normal Density Approximation for Network Traffic

There has been considerable research in the statistical anal-ysis of network traffic data. However, previous work has as-sumed the data to be normally distributed without supportingthis assumption [4]. In our observations, the Gaussian nature ofthe traffic is determined by the bell-shaped frequency histogram

TABLE IMSE SORTED FEATURE SET USED FOR NORMAL DENSITY APPROXIMATION

AND ANOMALY DETECTION

of the features used in this study. Additionally, we examinethe periodogram graph for similarity with graphs of generatedGaussian data with a matching mean and variance. The peri-odogram is also an indicator of correlation [41], [42]. In additionto visual verification, the mean square error (MSE) between themeasured and synthetically generated Gaussian-shaped densityis computed using

(40)

where “measured data( )” denotes the normalized histogram ofthe feature , “generated data( )” is the value of the respectivenormal density, and “mean(measured data)” is the mean valueof the histograms all the features used. We have experimen-tally shown that the random Gaussian nature of these featuresdoes not adversely affect their ability to discriminate networkanomalies. In fact, the feature with the lowest MSE, as shownin Table I, is “average port” which turns out to be the most dis-cerning data feature. The average port is an average of all theport numbers seen on the network and should not change drasti-cally under normal conditions. On the other hand, “peer factor”is the feature with the highest MSE in terms of Gaussian den-sity. Peer factor is a measurement of connections to the sameport on both the source and destination side and indicates an apriori agreement between the two peer computers. The prob-ability of two computers picking the same port at random is

, which is negligibly small.

IV. IMPLEMENTATION AND RESULTS

A. Network Data

Network flow data is loosely defined as a collection of in-formation about the bidirectional transfer of packets betweena source IP address and port and a destination IP address andport. In this research, we consider an 11-tuple record consistingof protocol, source and destination rates, source and destinationload, source and destination bytes, source and destination port,and source and destination IP addresses. Monitoring devices atthe Ohio University Internet backbone collect this information


Fig. 6. Entropy mean, entropy variance, and entropy FLD, averaged over 60 s.

in real-time and save it to disk for near-real-time statistical anal-ysis and visualization. In order to process data as a time series,we collect all the records for the same time period into a singlerecord representing one second in time. Cumulative statisticsare gathered using a C++ program and the data are analyzedin MatLab functions based on [40, eqs. (9.44)–(9.46)]. Multipleexperiments are run on various feature values and parameter set-tings in order to identify a system configuration which has themost discriminating power.

B. Data Processing

The approach, implemented in C++, first calculatesper-second average entropy for each feature as defined in(6). Different window sizes are applied to the entropy datastream in order to calculate the time averaged second-orderstatistics described in (8), (9), and (10). These calculations areused in the FLD (11) and shown on the left-hand side of theentropy graph. The resultant entropy graphs are subjected toFLD optimization as defined in (12) and (13) to determine theoptimal window size. Fig. 6 shows one-dimensional plots ofFLD in comparison with time-averaged mean and variance.Notice that this type of illustration is not as informative asthe one that is developed in this research and subsequentlyexplained.

C. Visualization and Detection Accuracy

The network entropy and the FLD graphs are designed tovisually alert analysts about a network traffic anomaly. For-merly, engineers relied on a slow and detailed examination ofthe text-based display to make these determinations. With theproposed visualization, a network engineer can quickly capture,and comprehend at a glance, the statistical characteristics of theevent while the software performs the first-pass examination oflarge amounts of network data without the need for human in-teraction. When an anomaly is visually indicated then a networkengineer can decide whether the event warrants further labor-in-tensive investigation.

Current practice still involves security analysts when decidingwhat actions to take for any given anomaly, hence there are no

Fig. 7. Network entropy and FLD datagrams derived for destination port en-tropy over 60 s with sliding window sizes of� � � (left) and� � � (right).

automated responses built-in to the analysis tool. As engineersbecome comfortable with the results, then more automated in-teraction procedures may be incorporated.

The graphs themselves are formulated to create a visual dis-tinction between two different (normal and anomaly) networktraffic classes. The illustration of the mean is important to indi-cate the magnitude of the event and the variance shows the rateof change in events. High time-average mean and low variancesupports the hypothesis stated that the FLD is an indicator ofnetwork anomalies. The importance of the FLD value is visu-ally emphasized by the height of the bar and the intensity of thecolor red while the mean and variance indicators are highlightedwith blue and green of varying intensity, respectively.

The colorization of the graphs was also designed to aid in therapid analysis of the stochastic event. An investigator, seeingbrighter red knows that FLD is high, indicating a high level of in-terest in the anomaly. Seeing brighter blue tells the engineer thatthe information is highly variable and indicates a need for moredetailed traffic analysis. The amount and brightness of green inthe visualization tells the responder how much traffic informa-tion is being processed. Further enhancement of this coloriza-tion-based visualization is the subject of future research.

Changing window sizes can enhance the visibility of theanomaly as shown in Fig. 7, where a window size of 4 makesthe anomaly more noticeable by increasing the size and bright-ness of the red FLD indicator. For example, an examination ofFigs. 7 and 8 shows how network anomalies may be more orless visible, in a given time frame, depending on the systemsettings. The visibility of anomalies is made clearer by com-paring visualizations of different data types over the same timeperiod. For example, the anomaly that appears in Fig. 7 whenexamining the single feature “port” is nonevident in Fig. 8where the total entropy of all features is considered for thesame set of data. This illustration suggests that measuring the


Fig. 8. Network entropy and FLD datagrams derived for total entropy over 60 swith sliding window sizes of� � � (left) and� � � (right).

entropy of multiple features in comparison with total entropy isessential for increasing detection accuracy .

These figures also show that window size plays a big role inthe efficacy of the proposed solution. Different window sizesare more or less effective at different times. In practice, sev-eral varying window sizes are run simultaneously and the op-timal is selected for any given moment in time. The votingscheme described in Fig. 5 assists in finding not only the op-timal window size but also the optimal feature set to be used.For example, if we consider Fig. 7, the window size ismore informative at time 63 while window size 4 is more de-scriptive at time 76. Further, at time 90 and forward ap-pears to convey more valuable information. Although we showdatagrams for varying window sizes (e.g., 3 and 4) for demon-stration purposes, more widely varying widow sizes have beenfound to offer better results when converging on optimal set-tings as explained in Section IV. Use of multiple features andwindows of varying sizes contributes to the event driven scala-bility and usability.

Detection accuracy is typically measured in terms of true pos-itive and true negative results by the ratio of the number of cor-rectly detected anomalies over the total number of anomalieswhich occurred during the observation period; i.e.,

(41)

To further clarify, we define an anomaly as a pattern of trafficwhich varies from Gaussian distribution in the manner describedin this research. Even expected, nonmalicious network eventssuch as backups, maintenance outages, or scheduled downloadsare identified as anomalies if the traffic patterns “vary fromnormal” for the given window size. These activities are also con-sidered anomalies “by design,” and as such, they are of interestto engineers and subject to justification by security analysts.

Hence, the definition of anomaly as a “variation fromGaussian density” also guides our analysis of false positivesand false negatives. Since all anomalous traffic must be justifiedby network analysts, there is no false positive from that perspec-tive. Further, since the definition of anomaly is purely statistical,the actual false positive and false negative error rates are de-termined using the underlying error rates of measured trafficwhen compared to Gaussian traffic. This normal traffic data isa statistical construct described in previous research by Celenk,et al. [22], [38] which serves as the ground truth for error ratecalculations. The recording of an actual ground truth of trafficdata for any significantly complex and unique network is notpossible since the exact nature of normal and anomalous trafficis not known. The unknown nature of network traffic patterns isthe underlying reason for applying a statistics-based approachto anomaly detection. This research shows that the measureof accuracy is inversely proportional to the mean squarederror measurements of Table I; i.e., .For example, by selecting “Average port” as a feature fromTable I, the detection accuracy can reach a value of 0.69 orapproximately 70% whereas the feature “Peer factor” from thesame table yields almost 0.03 or 3% accuracy. Thus, a featurethat has a small MSE yields results that have better accuracybecause the data statistics are closer to a normal distribution.On the other hand, a feature with a high MSE results in a lesseraccuracy and, therefore, it is not selected. The presentation,management and logging of all network anomalies is a matterfor developers of the final software application. Since errorrates are determined by the underlying statistics and not byindividuals, all events are monitored by the system, althoughoperators can manage the visibility of anomalies by changingthe underlying configuration parameters such as window size.Furthermore, administrators can provide feedback during theinitial configuration of the network.

D. Anomaly Prediction

This section describes the results of the normal density ap-proximation study and the prediction algorithm results. Fig. 9shows the periodogram, correlation results, and histogram ofa selected feature set; namely “Average port,”“High ports,” “Server factor,” and “Peered factor.”Similarity between the measured feature values and generatedGaussian data for all features are clearly visible except forthe attribute “Peered factor.” The significant variation maybe attributed to the highly stochastic nature of the traffic. InFig. 10, we show the auto- and cross-correlation plots of theselected features in order to determine the level of correlationfor this measured feature set. Notice that, while and

possess high correlation, the remaining pairs do not.This implies that features , , and should be uncorrelatedbefore they are used individually or collectively. Fig. 11 depicts100 s of collected real-time data and illustrates the predictionresults at different stages in the ADAP algorithm. In subplots(A) and (B), a dashed line shows the boundary of the Wienerand auto-correlation windows, respectively. It appears that theinitiation time of the attack is 2 s starting from time 261 sthrough 263 s. Once the attack is detected at time 263 s, pre-ventive measures are taken by the network analyst to prevent


Fig. 9. Snapshots of (A) measured signal, (B) Wiener output, (C) ARMA pre-diction, (D) difference between Wiener input and output, (E) difference betweenmeasured signal and ARMA output, and (F) difference between Wiener andARMA outputs.

Fig. 10. Auto- and cross-correlation plots for features � (average port), �(high ports), � (server factor), and � (peered factor) showing varying degreesof correlation among them.

damage to the network. Part (C) represents the ARMA pre-dictor signal, and (D) shows the difference between measuredvalues and ARMA output. The solid vertical line indicates apredicted anomaly at time (263 s). Notice that the peakin predictor measurements (E) and (F) occurs just before themaximum value of the actual anomaly in (A). This supportsour conclusion that the algorithm predicts network anomalies.In Fig. 12, we demonstrate the affect of varying parameterssuch as window size (shaded area). The solid vertical linesindicate locations in time of predicted anomalies. Part (A)corresponds to smaller sized windows while part (B) represents

Fig. 11. Snapshots of (a) measured signal � ��, (b) Wiener output �� ,(c) ARMA prediction �� , (d) magnitude of the difference betweenWiener input and output �� , (e) magnitude of the differencebetween measured signal and ARMA output �� as anomalymeasure, and (F) magnitude of the difference between Wiener and ARMAoutputs �� as anomaly predictor.

Fig. 12. Experimental results of sample runs on � , � , � , and, � , showingthe effect of changing parameters. The solid vertical lines indicate locations intime of predicted anomalies. Part (A) corresponds to smaller sized windowswhile part (B) represents larger window sizes.

larger window sizes. The system predicts anomalies of variousmagnitudes by changing parameters. This robust performanceis due to the fact that the ADAP unit has a feedback control,which allows it to adjust to the changing signal waveform.

V. CONCLUSIONS AND DISCUSSIONS

We have presented an approach that is based on relativelyshort-term observations of network features and their respec-tive time averaged entropy values. This approach helps in de-


termining acute and long-term changes in the network featurespace and presents system status in a visually compact informa-tion graph (called datagrams). First, average entropy for eachfeature is calculated for every second of observation. Then, theresultant short-term information measurement is subjected tofirst- and second-order time averaging statistics. The processthen applies the modified FLD to the underlying time averagednetwork entropy to identify the exact time of the security inci-dent or attack on the network. The colorized dual-entropy-typedatagram visualization is devised to help network engineers in-teract with the staggering amounts of network data. The pro-posed method has been tested under load on real-time networktraffic data from Ohio University’s main Internet connection.Experimentation has shown that the presented algorithm is ableto identify anomalies in selected network features including av-erage port, high port, server port, and peered port. Although thepresent work is currently used to detect such diverse network at-tacks as BotNets, worm outbreaks, and DoS’s, it really identifiesany anomaly of interest to network and security analysts. We areproceeding to evaluate this research in conjunction with existingnetwork defense systems. Furthermore, additional research isneeded to implement the proposed method in a time-varyingmultidimensional feature space by considering the correlationsof additional network features.

REFERENCES

[1] R. Kwitt and U. Hofmann, “Unsupervised anomaly detection in net-work traffic by means of robust PCA,” in Proc. Int. Conf. Computingin the Global Information Technology (ICCGI), 2007, pp. 37–37.

[2] G. Shen, D. Chen, and Z. Qin, “Anomaly detection based on aggregatednetwork behavior metrics,” in Proc. Wireless Communications, Net-working and Mobile Computing, 2007 (WiCom 2007), pp. 2210–2213.

[3] A. Karasaridis, B. Rexroad, and D. Hoeflin, “Wide-scale botnet detec-tion and characterization,” in Proc. First Conf. First Workshop on HotTopics in Understanding Botnets (HotBots’07), Berkeley, CA, 2007, p.7, USENIX Association.

[4] S. Jin, D. S. Yeung, and X. Wang, “Network intrusion detectionin covariance feature space,” Pattern Recognit., vol. 40, no. 8, pp.2185–2197, 2007.

[5] A. Sang and S. Li, “A predictability analysis of network traffic,” inProc. INFOCOM (1), 2000, pp. 342–351.

[6] K. Cho, R. Kaizaki, and A. Kato, “An aggregation technique for trafficmonitoring,” in Symp. Applications and the Internet (SAINT) Work-shops, 2002, p. 74.

[7] R. Pang, V. Yegneswaran, P. Barford, V. Paxson, and L. Peterson,“Characteristics of internet background radiation,” in Proc. ACM In-ternet Measurement Conf., Oct. 2004, pp. 27–40.

[8] A. Feldmann, A. C. Gilbert, and W. Willinger, “Data networks as cas-cades: Investigating the multifractal nature of internet WAN traffic,” inProc. SIGCOMM, 1998, pp. 42–55.

[9] W. Yurcik and Y. Li, “Internet security visualization case study: Instru-menting a network for netflow security visualization tools,” in Proc.Annual Computer Security Applications Conf. (ACSAC 05), Tucson,AZ, Dec. 5–9, 2005.

[10] D. Plonka, “A network traffic flow reporting and visualization tool,” inProc. 14th USENIX Conf. System Administration, New Orleans, LA,2000, pp. 305–318.

[11] Y. Gong, Security Focus Article: Detecting Worms and Abnormal Ac-tivities With NetFlows, Part 1 Aug. 2004 [Online]. Available: http://www.securityfocus.com/infocus/1796

[12] Y. Gong, Security Focus Article: Detecting Worms and Abnormal Ac-tivities With NetFlows, Part 2 Sep. 2004 [Online]. Available: http://www.securityfocus.com/infocus/1796

[13] V. Alarcon-Aquino and J. A. Barria, “Multiresolution FIR neural-net-work-based learning algorithm applied to network traffic prediction,”IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 2, pp.208–220, Mar. 2006.

[14] Y. Gu, A. McCallum, and D. Towsley, “Detecting anomalies in net-work traffic using maximum entropy estimation,” in Proc. 5th ACMSIGCOMM Conf. Internet Measurement (IMC ’05), New York, 2005,pp. 1–6, ACM.

[15] A. Lakhina, M. Crovella, and C. Diot, Mining Anomalies Using TrafficDistributions CS Department, Boston University, Tech. Rep. 2005-002,Feb. 2005.

[16] S. Foresti, J. Agutter, Y. Livnat, S. Moon, and R. F. Erbacher, “Visualcorrelation of network alerts,” IEEE Comput. Graphics Applicat., vol.26, no. 2, pp. 48–59, Mar./Apr. 2006.

[17] R. Eimann, U. Speidel, N. Brownlee, and J. Yang, Network Event De-tection With T-Entropy Centre for Discrete Mathematics and Theoret-ical Computer Science, University of Auckland, New Zealand, Rep.CDMTCS-266, May 2005.

[18] A. Lall, V. Sekar, M. Ogihara, J. J. Xu, and H. Zhang, Data StreamingAlgorithms for Estimating Entropy of Network Traffic Computer Sci-ence Department, University of Rochester, Tech. Rep. TR886, Nov.2005.

[19] E. F. Harrington, “Measuring network change: Rényi cross entropy andthe second order degree distribution,” in Proc. Passive and Active Mea-surement (PAM) Conf., Adelaide, Australia, Mar. 2006.

[20] S. Gianvecchio and H. Wang, “Detecting covert timing channels: Anentropy-based approach,” in Proc. ACM Conf. Computer and Commu-nications Security, 2007, pp. 307–316.

[21] S. S. Kim, A. L. N. Reddy, and M. Vannucci, “Detecting trafficanomalies through aggregate analysis of packet header data,” inNetworking. New York: Springer, 2004, vol. 3042, pp. 1047–1059.

[22] M. Celenk, T. Conley, J. Graham, and J. Willis, “Anomaly prediction innetwork traffic using adaptive Wiener filtering and ARMA modeling,”in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, Oct. 2008, pp.3548–3553.

[23] X. Fu, B. Graham, R. Bettati, and W. Zhao, “On effectiveness of linkpadding for statistical traffic analysis attacks,” in Proc. 23rd IEEE Int.Conf. Distributed Computing Systems (ICDCS ’03), Washington, DC,2003, p. 340.

[24] A. Wagner and B. Plattner, “Entropy based worm and anomaly de-tection in fast IP networks,” in Proc. 14th IEEE Int. Workshops onEnabling Technologies: Infrastructure for Collaborative Enterprise,Washington, DC, 2005, pp. 172–177.

[25] M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans.Signal Process., vol. 51, no. 8, pp. 2191–2204, Aug. 2003.

[26] H. Hajji, “Statistical analysis of network traffic for adaptive faults de-tection,” IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1053–1063, Sep.2005.

[27] S. R. Gaddam and K. S. Balagani, “K-Means+ID3: A novel method forsupervised anomaly detection by cascading K-Means clustering andID3 decision tree learning methods,” IEEE Trans. Knowl. Data Eng.,vol. 19, no. 3, pp. 345–354, Mar. 2007.

[28] A. Ziviani, M. L. Monsores, P. S. S. Rodrigues, and A. T. A. Gomes,“Network anomaly detection using nonextensive entropy,” IEEECommun. Lett., vol. 11, no. 12, pp. 1034–1036, Dec. 2007.

[29] S. S. Kim and A. L. N. Reddy, “Statistical techniques for detectingtraffic anomalies through packet header data,” IEEE/ACM Trans.Netw., vol. 16, no. 3, pp. 562–575, Jun. 2008.

[30] G. Androulidakis, V. Chatzigiannakis, and S. Papavassiliou, “Networkanomaly detection and classification via opportunistic sampling,” IEEENetwork, vol. 23, no. 1, pp. 6–12, Jan./Feb. 2009.

[31] S. Theodoridis and K. Koutroumbas, Pattern Recognition, 3rd ed.New York: Academic, Feb. 2006.

[32] B. Porat, Digital Processing of Random Signals Theory andMethods. Englewood Cliffs, NJ: Prentice-Hall, 1994.

[33] S. M. Asmoredjo, “A Probabilistic Model for Cyber Attacks and Pro-tection,” Master’s thesis, Delft University of Technology, Delft, TheNetherlands, Jun. 2005.

[34] A. Schneider, “Methods of Internet Worm Propagation,” Master’sthesis, Wake Forest University, Winston-Salem, NC, 2009, 27109.

[35] G. Travis, E. Balas, D. A. J. Ripley, and S. Wallace, Analysis of the“SQL Slammer” Worm and Its Effects on Indiana University and Re-lated Institutions Feb. 2003 [Online]. Available: http://paintsquirrel.ucs.indiana.edu/pdf/SLAMMER.pdf, pp. 1–18

[36] M. Abu Rajab, F. Monrose, and A. Terzis, “Worm evolution trackingvia timing analysis,” in Proc. 2005 ACM Workshop on Rapid Malcode(WORM ’05), New York, 2005, pp. 52–59.

[37] B. C. Bag, S. K. Banik, and D. S. Ray, “The noise properties of sto-chastic processes and entropy production,” Phys. Rev. E, vol. 64, p.026110, 2001.


[38] M. Celenk, T. Conley, J. Willis, and J. Graham, “Anomaly detectionand visualization using Fisher discriminant clustering of network en-tropy,” in Proc. IEEE Third Int. Conf. Digital Information Management(ICDIM 2008), Nov. 2008, pp. 216–220.

[39] W. Stallings, Data and Computer Communications (7th Edition).Englewood Cliffs, NJ: Prentice-Hall, 2004.

[40] J. S. Lim, Two-Dimensional Signal and Image Processing. Engle-wood Cliffs, NJ: Prentice-Hall, 1990.

[41] W. L. Crum, “Tests for serial correlation in regression analysis basedon the periodogram of least-squares residuals,” J. Amer. Statist. Assoc.,vol. 18, no. 143, pp. 889–899, Sep. 1923.

[42] J. Durbin, “The resemblance between the ordinate of the periodogramand the correlation coefficient,” Biometrika, vol. 56, no. 1, pp. 1–15,Mar. 1969.

Mehmet Celenk (S’84–M’85) received the B.S. andM.S. degrees from Istanbul Technical University, in1974 and 1976, in electrical and communicationsengineering, and the Ph.D. degree from StevensInstitute of Technology in electrical engineering andcomputer science, in 1983, where he was the RobertCrook Stanley Graduate Fellow in 1985.

He served on the Turkish Army from 1984 to1985 as a lieutenant. He joined Ohio University in1985, where he is currently an Associate Professorof Electrical Engineering and Computer Science.

His research interests include image and video procession, computer vision,multimedia, data fusion, pattern recognition, and biomedical engineering. Hehas published over 200 articles in these fields and received over $0.5 M hyper-cube processor equipment grant from Symult and Ametek Co. He supervisedtwo visiting scholars, one of whom was awarded $20 K from TUBITAK, and31 M.S./Ph.D. theses.

Dr. Celenk received the distinguished service award from the Signal Schoolin Ankara for his R&D work and his associate editorial services for the Com-munications Journal. He was the recipient of the 1988 Fritz & Dolores RussResearch Award of the College of Engineering and Technology of Ohio Uni-versity. He was awarded the Avionics Academic Challenge Faculty Fellowshipat Ohio University from 1988 to 1992. He has been an active reviewer for var-ious professional societies, journals, publishers, and the NSF. He has been areview panelist for the New York State Office of Science, Technology & Aca-demic Research (NYSTAR) in 2002–2007. He has been an Associate Editorof the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART

A: SYSTEMS AND HUMANS, since 2005, a member of the Editorial Board ofthe Journal of Recent Patents on Signal Processing, since 2008, and servedas session-chair/organizing-committee-member of ICDP 2009, ICSMC 2008,IASTED ICSIP 2008, VISAPP 2008, ICIP 2006, ICPR 2006, SSST 2005–1992,CGIP 2000, and CVPR 1999 conferences. He is a member of Eta Kappa Nu, andformer member of SPIE, IS&T, ACM, ASEE, and OE.

Thomas Conley received the B.S. degree in envi-ronmental resources management from The Pennsyl-vania State University, in 1983. He is working towardthe Master’s degree in the School of Electrical En-gineering and Computer Science at Ohio University,specializing in computer networking and informationsecurity.

He then joined the Peace Corps as a Fisheries Ex-tension Officer and served in Kenya from 1983 to1986. He is a Certified Information Systems SecurityProfessional (CISSP). He has worked as a computer

consultant for various government organizations including the U.S. State De-partment, the U.S. Department of Education, and the Office of Research andDevelopment for the Central Intelligence Agency. In addition, he was a re-search and development engineer for the wireless industry before joining OhioUniversity as a Senior Network Security Analyst where he is responsible for nu-merous information security-related security monitoring, computer and networkforensics, advising network architecture and engineering efforts, and developingcustom technical solutions in support of information security. His research inter-ests include data mining, statistical data analysis, and visualization for networktraffic applications. He recently published two papers; one in IEEE ICDIM 2008and the other in IEEE ICSMC 2008.

John Willis (S’10) received the B.S. degree in elec-trical and computer engineering from Ohio Univer-sity, Athens, OH, in 2009, where he was a memberof the men’s varsity track team running the 400 meterdash. He is working toward the M.S. degree in elec-trical engineering and computer science from OhioUniversity, specializing in data fusion and image pro-cessing.

He started his research career when he was a juniorstudent in his B.S. program as an integral member of ateam seeking a potential application of pattern recog-

nition research in network traffic. He participated in publishing papers and pre-senting at IEEE ICDIM 2008 and IEEE ICSMC 2008. In addition, he is workingon a cost estimation project for General Electric’s Jet Engine division.

James Graham (S’07) received the B.S. degreein electrical engineering from the University ofMichigan, Ann Arbor, in 2001, and the M.S. degreein electrical engineering from Ohio University,Athens, OH, in 2007. He is currently workingtoward the Ph.D. degree in electrical engineeringand computer science at Ohio University.

He worked in industry as a research contractor withKelly Scientific at Dow Chemical for a few years. Hespecializes in image processing, intelligent machinesand systems, in which he has contributed to several

publications.

predictive network anomaly detection and visualization

Documents