ws97-07-013

Intrusion Detection with Neural Networks

Jake RyanDepartment of Computer SciencesThe University of Texas at Austin

Austin, TX 78712 USAraven@cs, utexas, edu

Meng-Jang LinDept. of Electr. and Computer Engineering

The University of Texas at AustinAustin, TX 78712 USA

mj @orac. ece. ut exas. edu

Risto MiikkulainenDepartment of Computer SciencesThe University of Texas at ..\uslin

Austin, TX 78712 USAr isto@cs .utexas. edu

Abstract

With the rapid expansion of computer networks dur-ing the past few years, security has become a crucialissue for modern computer systems. A good way todetect illegitimate use is through monitoring unusualuser activity. Methods of intrusion detection based onhand-coded rule sets or predicting commands on-lineare laborous to build or not very reliable. This pa-per proposes a new way of applying neural networksto detect intrusions. We believe that a user leavesa ’print’ when using the system; a neural networkcan be used to learn this print and identify each usermuch like detectives use thumbprints to place peopleat crime scenes. If a user’s behavior does not matchhis/her print, the system administrator can be alertedof a possible security breech. A backpropagation neu-ral network called NNID (Neural Network IntrusionDetector) was trained in the identification task andtested experimentally on a system of 10 users. Thesystem was 96% accurate in detecting unusual activ-ity, with 7% false alarm rate. These results suggestthat learning user profiles is an effective way for de-tecting intrusions.

IntroductionIntrusion detection schemes can be classified into twocategories: misuse and anomaly intrusion detection.Misuse refers to known attacks that exploit the knownvulnerabilities of the system. Anomaly means unusualactivity in general that could indicate an intrusion. Ifthe observed activity of a user deviates from the ex-pected behavior, an anomaly is said to occur.

Misuse detection can be very powerful on those at-tacks that have been programmed in to the detectionsystem. However, it is not possible to anticipate allthe different attacks that could occur, and even the at-tempt is laborous. Some kind of anomaly detection isultimately necessary. One problem with anomaly de-tection is that it is likely to raise many false alarms.Unusual but legitimate use may sometimes be consid-ered anomalous. The challenge is to develop a model oflegitimate behavior that would accept novel legitimateuse.

It is difficult to build such a model for the same rea-son that it is hard to build a comprehensive misuse

detection system: it is not possible to anticipate allpossible variations of such behavior. The task can be,made tractable in three ways: (1) Instead of gen¢,rallegitimate use, the behavior of individual users in aparticular system can be modeled. The task of char-acterizing regular patterns in the behavior of an indi-vidual user is an easier task than trying to do it for allusers simultaneously. (2) The patterns of behavior ,’anbe learned for examples of legitimate use, instead ofhaving to describe them by hand-coding possible be-haviors. (3) Detecting an intrusion real-time, ~ theuser is typing commands, is very difficult because theorder of commands can vary a lot. In many cases itis enough to recognize that the distribution of com-mands over the entire login session, or even the entireday, differs from the usual.

The system presented in this paper, NNID (N~u-

ral Network Intrusion Detector), is based on thesethree ideas. NNID is a backpropagation neural net-work trained to identify users based on what com-mands they use during a day. The system adminis-trator runs NNID at the end of each day to see if theusers’ sessions match their normal pattern. If not, aninvestigation can be launched. The NNID model isimplemented in a UNIX environment and consists ofkeeping logs of the commands executed, forming com-mand histograms for each user, and learning the users"profiles from these histograms. NNID provides an ele-gant solution to off-line monitoring utilizing these userprofiles. In a system of 10 users, NNID was 96c~. ac-curate in detecting anomalous behavior (i.e. randomusage patterns), with a false alarm rate of 7%. Theseresults show that a learning offiine monitoring syst~,msuch as NNID can achieve better performance than sys-tems that attempt to detect anomalies on-line in thecommand sequences, and with computationally mu,’hless effort.

The rest of the paper outlines other approaches to iu-trusion detection and motivates the NNID approach inmore detail, presents the implementation and an ewd-uation on a real-world computer system, and out lilwssome open issues and avenues for future work.

72

From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.

Intrusion Detection SystemsMany misuse and anomaly intrusion detection systems(IDSs) are based oil the general model proposed l)enning (I 987). This model is independent of the plat-form, system vulnerability, and type of intrusion. Itmaintains a set of historical profiles for users, matchesan audit record with the appropriate profile, updatesthe protih’ whenever necessary, and reports any anoma-lies detected. Another ~-omponent, a rule set, is usedfor detecting misuse.

Actual systems implement the general model withdifferent techniques (see l?rank (1994), Mukherjee et (199.1) for an overview}. Often statistical methods areused to measure how anomalous the behavior is, thatis, how difl’erent e.g. the commands used are from nor-real behavior. Such approaches require that the distri-bution of subjects’ behavior is known. The behaviorcan be represented as a rule-b~ed model (Garvey andl.unt 1991), in terms of predictive pattern generation(’I’eng et al. 1990}, or using state transition analysis(Porr~ et ai. 1995). Pattern matching techniques arethen used to determine whether the sequence of eventsis part of normal behavior, constitutes an anomaly, orfits the description of a known attack.

IDSs also differ in whether they are on-line or off-line. Off-line IDSs are run periodically and they detectintrusions after-the-fact based on system logs. On-linesystems are designed to detect intrusions while theyare happening, thereby allowing for quicker interven-tion. On-line IDSs are computationally very expensivebecause they require continuous monitoring. Decisionsneed to be made quickly with less data and thereforethey are not as reliable.

Several IDSs that employ neural networks for on-lineintrusion detection have been proposed (Debar et al.1992: Fox et al. 1990). These systems learn to pre-dict the next command based on a sequence of previ-ous commands by a specific user. Through a shiftingwindow, the network receives the w most recent com-mands ,as its input. The network is recurrent, that is,part of the output is fed back as the input for the nextstep: thus, the network is constantly observing the newtrend and "forgets" old behavior over time. The sizeof Ihe window is an important parameter: If w is toosmall, there will be many false positives; if it is toobig, the network may not generalize well to novel se-quences. The most recent of such systems (Debar et al.1992) can predict the next command correctly around80¢~. of the time, and accept a command as predictable(among the three most likely next commands) 90% t he t ime.

One problem with the on-line approach is that mostof the effort goes into predicting the order of com-mands. In many ca.ses, the order does not mattermuch. but the distribution of commands that are usedis revealing. A possibly etfective approach could there-fore be to collect statistics about the users’ commandusage over a period of time, such ,as a day, and try to

recognize the distribution of commands as legitimate oranomalous off-line. This is the idea behind the NNIDsystem.

The NNID SystemThe NNID anomaly intrusion detection system is basedon identifying a legitimate user based on the distribu-tion of commands she or he executes. This is justifiablebecause different users tend to exhibit different behav-ior, depending on their needs of the system. Some usethe system to send and receive e-mail only, and do notrequire services such as programming and compilation.Some engage in all kinds of activities including editing,programming, e-mail, Web browsing, and so on. How-ever, even two users that do the same thing may notuse the same application program. For example, somemay prefer the "vi" editor to "emacs", favor "pine"over "elm" as their mail utility program, or use "gcc"more often than "cc" to compile C programs. Also, tilefrequency with which a command is used varies fromuser to user. The set of commands used and theirfrequency, therefore, constitutes a ’print’ of the user,reflecting the task performed and the choice of appli-cation programs, and it should be possible to identifythe user based on this information.

It should be noted that this approach works even ifsome users have aliases set up as shorthands for longcommands they use frequently, because the audit logrecords the actual commands executed by the system.Users’ privacy is not violated, since the arguments to acommand do not need to be recorded. That is, we mayknow that a user sends e-mail five times a day, but wedo not need to know to whom the mail is addressed.

Building NNID for a particular computer systemconsists of the following three phases:

1. Collecting training data: Obtain the audit logs foreach user for a period of several days. For each dayand user, form a vector that represents how oftenthe user executed each command.

2. Training: Train the neural network to identify theuser based on these command distribution vectors.

3. Performance: Let the network identify the user foreach new command distribution vector. If the net-work’s suggestion is different from the actual user,or if the network does not have a clear suggestion,signal an anomaly.

The particular implementation of NNID and the envi-ronment where it was tested is described in the nextsection.

ExperimentsThe NNID system was built and tested on a machinethat serves a particular research group at the Depart-ment of Electrical and Computer Engineering at tileUniversity of Texas at Austin. This machine has 10total users; some are regular users, with several other

73

as awk bc bibtex calendar cat chmod comsat ccut cvs date df diff du dvips egrep eFm

cppemacs

expr fgreI~ filter find finger fmt from t tp gcc db

~ostview i~maxe~

i~zip hostname id if~o.nfi.g tspell ~ttess ~ore~ lprm ls machine mail make,pr

man mesg metamail dir more movemail mpage mt mv netscapenetstat nm ob~dump perl p~p ping ps pwd rcp res~zerm rsh sea sendmail sort strip stty tml tartcsh tee test tgif to tput tr tt

w~ereisuname vacation

xcalc x~’vi xhostvi virtex w wc xbiff++ xterm

Table 1: The 100 commands used to describe user behavior. The number of times the user executed ,’a,’hof these commands during the day was recorded, mapped into a nonlinear scale of 11 intervals, and concatenat~.dinto a 100-dimensional input vector, representing the usage pattern for that user for that day.

users logging in intermittently. This platform was cho-sen for three reasons:

1. The operating system (NetBSD) provides audit traillogging for accounting purposes and this option hadbeen enabled on this system.

2. The number of users and the total number of com-mands executed per day are on an order of mag-nitude that is manageable. Thus, the feasibility ofthe approach could be tested with real-world datawithout getting into scalability issues.

3. The system is relatively unknown to outsiders andthe users are all known to us, so that it is likelythat the data collected on it consists of normal userbehavior (free of intrusions).

Data was collected on this system for 12 days, re-sulting in 89 user-days. Instead of trying to optimizethe selection of features (commands) for the input,we decided to simply use a set of 100 most commoncommands in the logs (listed in Table 1), and let thenetwork figure out what information was importantand what superfluous. Intelligent selection of featuresmight improve the results some but the current ap-proach is easy to implement and proves the point.

In order to introduce more overlap between inputvectors, and therefore better generalization, the num-ber of times a command was used was divided into in-tervals. There were 11 intervals, non-linearly spaced,so that the representation is more accurate at lowerfrequencies where it is most important. The first in-terval meant the command was never used; the secondthat it was used once or twice, and so on until thelast interval where the command was used more than500 times. The intervals were represented by valuesfrom 0.0 to 1.0 in 0.1 increments. These values, onefor each command, were then concatenated into a 100-dimensional command distribution vector (also calleduser vector below) to be used as input to the ,leuralnetwork.

The standard three-layer backpropagation architec-ture was chosen for the neural network. The idea wasto get results on the most standard and general ar-chitecture so that the feasibility of the approach couldbe demonstrated and the results would be easily repli-cable. More sophisticated architectures could be used

and they would probably lead to slightly better result s.The input layer consisted of 100 units, representing ~h,,user vector; the hidden layer had 30 units and the or,t-put layer 10 units, one for each user. The network wasimplemented in the PlaNet Neural Network sinmlat,,r(Miyata 1991).

Results

To avoid overtraining, several training sessions wererun prior to the actual experiments to see how manytraining cycles would give the highest performa,wc.The network was trained on 8 randomly chosen days ofdata (65 user vectors), and its performance was testedon the remaining 3 days (24 vectors) after epochs :10.50, 100, 200, and 300, of which 100 gave the best per-formance. Four splits of the data into training amltesting sets were created by randomly picking 8 daysfor training.

The resulting four networks were tested in two t,~sks:

1. Identifying the user vectors of the remaining 3 days.If the activation of the output unit representingthe correct user was higher than those of all otherunits, and also higher than 0.5, the identification wascounted as correct. Otherwise, a false positive wascounted.

2. Identifying 100 randomly-generated user vectors. Ifall output units had an activation less than 0.5. thenetwork was taken to correctly identify the vector asan anomaly (i.e. not any of the known users in thesystem). Otherwise, the most highly active outp~,tunit identifies the network’s suggestion. Since allintrusions occur under one of the 10 user accounts.there is a 1/10 chance that the suggestion wouldaccidentally match the compromised user accol,ntand the intrusion would not be detected. Therefore.1/10 of all such cases were counted as false negatiw,s.

The second test is a suggestive measure of the accu-racy of the system. It is not possible to come up wit hvectors that would represent a good sampling of actualintrusions; the idea here was to generate vectors wherethe values for each command were randomly drawnfrom the distribution of values for that command i,lthe entire data set. In other words, the random test

74

- ’ 3 4 b / U JOJtPut.

) " 2 3 4 5 6 7 8 O.Itl~t

¯ I02tl~t

Im .... I

OJt~t

J. ¯ ....

)3~343570302L~L

I

3.23456783OJt~t

..... Imoe

Output

0 t 2 34 56 ? 89

Ill ...... ¯ ’1

OUtput

, , ¯ ¯

0 L ..2 " 4 ~ I~ 7 ~O~tp~t

0 J. 2 3 4 5 6 7 8 9Output

........ Il

V "1 ~’ 0 4 ~3 b ¯ t; ~1Oul~ut

I. .m- ¯

0 1 2 3 4 5 6 7 E 9OL* rut

¯ . . o . ¯ ....

0 1 2 ~ 4. 5 6 ¯ E ~JOubFut

Ou~,FUt

0 1 P. 3 4 ~ U ¯ IE ~]Out, l-Ul.

¯ ¯ ¯ . .

0 £ 2 3 45 6 7 E SOut.put

O~pu~

1........ I( 12 3 ~- 5 6 7 8

O~pu:

C 1 2 5 c ~ ~ 7 YO,.~,pu;

t...... o . .

C 1 2 ~ s ~ 6 / 8 9Out.pu~,

........II1

Oubpu,

t.......

¢ I 2 3 ,- 5 6 ?’ 8 9Output,

Figure 1 : User identification with the NNID Network. The output layer of NNID is shown for each of the 24 testvectors ia one of the .! splits tested. The output units are lined up from left to right, and their activations are representedh.v lhe size of Che squares. In this split there were two false alarms: one is displayed in the top right with activation 0.01,aml ouc ill the seco,d row from lhc bottom, second column from the left with 0.35. All the other test vectors are identifiedc:orrectl.v wi(h nctivation higher than 0.5.

75

vectors had the same first-order statistics as the legit-imate user vectors, but had no higher-order correla-tions. Therefore they constitute a neutral but realisticsample of unusual behavior.

All four splits led to similar results. On average,the networks rejected 63% of the random user vectors,leading to an anomaly detection rate of 96%. Theycorrectly identified the legitimate user vectors 93% ofthe time, giving a false alarm rate of 7%.

Figure 1 shows the output of the network for oneof the splits. Out of 24 legitimate user vectors, thenetwork identified 22. Most of the time the correctoutput unit is very highly activated, indicating highcertainty of identification. However, the activation ofthe highest unit was below 0.5 for two of the inputs,resulting in a false alarm.

Interestingly, in all false alarms in all splits, thefalsely-accused user was always the same. A closerlook at the data set revealed that there were only 3days of data on this user. He used the system veryinfrequently, and the network could not learn a properprofile for him. While it would be easy to fix thisproblem by collecting more data in this case, we be-lieve this is a problem that would be difficult to ruleout in general. No matter how much data one collects,there may still not be enough for some extremely infre-quent user. Therefore, we believe the results obtainedin this rather small data set give a realistic picture ofthe performance of the NNID system.

Discussion and Future WorkAn important question is, how well does the perfor-mance of NNID scale with the number of users? Al-though there are many computer systems that haveno more than a dozen users, most intrusions occurin larger systems with hundreds of users. With moreusers, the network would have to make finer distinc-tions, and it would be difficult to maintain the samelow level of false alarms. However, the rate of detect-ing anomalies may not change much, as long as thenetwork has learned the patterns of the actual userswell. Any activity that differs from the user’s normalbehavior would still be detected as an anomaly.

Training the network to represent many more usersmay take longer and require a larger network, but itshould be possible because the user profiles share a lotof common structure, and neural networks in generalare good at learning such data. Optimizing the set ofcommands included in the user vector, and the size ofthe value intervals, might also have a large impact onperformance. It would be interesting to determine thecurve of performance versus the number of users, andalso see how the size of the input vector and the gran-ularity of the value intervals affect that curve. This isthe most important direction of future work.

Another important issue is, how much does a user’sbehavior change over time? If behavior changes dra-matically, NNID must be recalibrated often or the

number of false positives would increase. Fortunatelysuch retraining is easy to do. Since NNID parses dailyactivity of each user into a user-vector, the user profilecan be updated daily. NNID could then be retraitwdperiodically. In the current system it takes only about90 seconds and would not be a great burden on th,,system.

ConclusionExperimental evaluation on real-world data shows thatNNID can learn to identify users simply by what com-mands they use and how often, and such an ident ifi~’a-tion can be used to detect intrusions in a network com-puter system. The order of commands does not lw~,,lto be taken into account. NNID is easy to train amlinexpensive to run because it operates off-line on dailylogs. As long as real-time detection is not required,NNID constitutes a promising, practical approach ~oanomaly intrusion detection.

Acknowledgements

Special thanks to Mike Dahlin and Tom Ziaja for feed-back on an earlier version of this paper, and to JimBednar for help with the PlaNet simulator. This re-search was supported in part by DOD-ARPA con-tract F30602-96-1-0313, NSF grant IRI-9504317, andthe Texas Higher Education Coordinating board grantARP-444.

ReferencesDebar, H., Becker, M., and Siboni, D. (1992). A neural

network component for an intrusion detection sys-tem. In Proceedings of the 1992 IEEE ComputerSociety Symposium on Research in Computer .~’~-curity and Privacy, 240-250.

Denning, D. E. (1987). An intrusion detection model.IEEE Transactions on Software Engineering, SE-13:222-232.

Fox, K. L., Henning, l:t. 1~., l~eed, J. H., and Simonian.R. (1990). A neural network approach towards in-trusion detection. In Proceedings of the 13th ,V~l-tional Computer Security Conference, 125-134.

Frank, J. (1994). Artificial intelligence and intrusiondetection: Current and future directions. In Pro-ceedings of the National 17th Computer Sec~lrttt/Conference.

Garvey, T. D., and Lunt, T. F. (1991). Model-basedintrusion detection. In Proceedings of the 14th .V,t-tional Computer Security Conference.

Miyata, Y. (1991). A User’s Guide to PlaNet t; r-sion 5.6 - A Tool for Constructing, Running, mtdLooking in to a PDP Network. Computer Sci-ence Department, University of Colorado, Boul-der, Boulder, CO.

76

.Mukherjee, B., Ileberlein, L. T., and Levitt, K. N.(199,1). Network intrusion detection. IEEE Net-work. 26 -.i 1.

Porr~, P. A.. llgun, K., and Kemmerer, R. A. (1998).State transition analysis: A rule-based intrusiondetection approach. IEEE Transactions on Soft-ware Engineering, SE-21:181-199.

Teng, II. S., Chen. K., and Lu, S. C. (1990). Adap-tive real-! ime anomaly detection using inductivelygenerated sequential patterns. In Proceedings ofth~ 1990 IEEE Srlmposimn on Research m Com-Imtcr ,’,’~curzt~l and PrlvactI, 278-284.

77

ws97-07-013

Documents