intrusion detection using sequences of system calls

Intrusion Detection using Sequences of System Calls

By S. Hofmeyr & S. Forrest

Overview

Focus: privileged processesDiscriminator: system call sequencesBuilding a database: defining “normal” Detecting anomalies: how to measureResults: promising numbersConcerns: remaining doubtsExtensions of research: Jones, Li & Lin

Inspiration

Human immune systemRecognition of self Rejection of nonself

How would we describe “self” for a software system, or a program?

Focus and Motivation

Focus on privileged processes Exploitation can give a user root access They provide a natural boundary

e.g. telnet daemon, login daemon

Privileged processes are easier to trackSpecific, limited functionStable over timeContrast with the diversity of user actions

Where do we look?

Need to distinguish when: Privileged process runs normally Privileged process exhibits an anomaly

The discriminator is the observable entity used to distinguish between these two

Use sequences of system calls as the discriminator, the signature

How much detail?

Discriminator is sequences of system calls Simple temporal ordering is chosen Ignore parameters Ignore specific timing information Ignore everything else!

Why? As much as possible, work with simple assumptions

Is it “enough”?

Is it enough detail?

Does the discriminator include enough detail for this hypothesis to hold? Answer seems to be yes !

Extra complication: due to the variability in configuration and use of individual systems, the set of “normal” sequences of system calls will be different on different systems

Design Decisions

Remember temporal ordering of calls Not total sequence, but sequences of length k

What size should k be? Long enough to detect anomalies, short as

possible Empirical observation: length 6 to 10 is

sufficientSo “self” is a database of (unordered) short

call sequences

Building the “normal” database

Synthetic Assurance that the normal database

contains no intrusions; reproducible But does not reflect any particular real

user activityActual use

Necessary to generate from actual use in order to have a unique “self”

How long to accumulate? Is it clean?

The normal database

Database of normal sequences does not contain all legal sequences If it did, anomalies would not be

detected Some rare sequences will not be used

during database initializationDatabase is stored as a forest to

save space

Signature Database Structure (length 3)

fopen fread strcmp

fread strcmp strcmp

strcmp strcmp fopen

strcmp fopen fread

fopen fread strcmp strcmp fopen fread strcmp

strcmp

Derive Robust Signature Database

Robust Signature Database

0 2000 4000 6000 8000 10000

Total Seqences Scanned

Detecting anomalies

A call sequence not in the database is an anomalous sequence

Strength of that anomalous sequence is measured by “Hamming distance” to the closest normal sequence (called dmin)

Any call trace with an anomalous sequence is an anomalous trace

Detecting anomalies

Strength of an anomalous trace is the maximum dmin of the trace normalized for the value of k (length of sequences in the database): ŜA = max{dmin values for the trace} / k Value is between 0 and 1

By adjusting the threshold value for ŜA, false positives can be reduced

Efficiency

Complexity of computing dmin O(k(RAN + 1))

k is sequence length, RA is ratio of anomalous to normal sequences, N is the number of sequences in the database

dmin is calculated after every system call The constant associated with this algorithm

is very important Not yet running in real time

Results (synthetic)Sanity test: If different programs are not

distinguishable, anomalies within one program will certainly not be either

Easy to distinguish between programs; mismatches on well more than 50% of the instruction sequences (and ŜA >= 0.6)

All intrusions (both attempted & successful) produced anomalies of varying strengths

Results (real environment)

The conjecture of unique normal databases Experiments in two configurations (at

UNM and MIT) had very different databases for the same program (lpr)

Is this typical?

Closing concerns

False positives vs false negatives If forced to choose, UNM prefers to have

false negatives because layering can mitigate

Saw 1 per 100 print jobs (lpr) Due to system problems

Is ŜA a good measure? It could help generate false positives Single extra system call might make ŜA = 0.5

Annex Material

Some UVa experimentsS. Li, Y. Lin, and A. Jones

Signature Length Has Little Effect

Illustrated by two attacks on Apache

Varied sequence length from 2 to 30

We chose length 10 to have margin of error

0 10 20 30 40Sequence Length

Effectiveness: Buffer Overflow

Successfully detected buffer overflow attacks against wu-ftpd

Work well because attacker code adds new sequences of library calls

#Mismatches

%Mismatches

Normalized Anomaly Signal

Stack Overwrite 467 3.5 0.7

Realpath Vulnerability

569 2.7 0.6

High normalized anomaly signals indicate attacks

Effectiveness: Denial of Service

Simulated DOS attack that uses up all available memory

As attack progresses, library calls requesting memory return abnormally and are re-issued

DOS attack caused application to invoke new library call, fsync

Program - vi #Mismatches

%Mismatches

Normalized Anomaly Signal

Normal Run 0 0 0

DOS Attack 101 2.6 0.6

No intrusion detected

High normalized anomaly signal indicates attack

intrusion detection using sequences of system calls

length of sequences

sequences of system

closest normal sequence

sequences of length

database of unordered

normal detecting anomalies

detectedsome rare sequences

software system

Documents

security information management firewall management,...

intrusion detection & intrusion prevention systems

another set of sequences, sub-sequences, and sequences of...

illinois postsecondary perkins local application...

beatty sequences and langford sequences*

using fuzzy k-modes to analyze patterns of system calls for...

intrusion detection • principles • models of intrusion...

arithmetic sequences - washington-liberty...arithmetic...

intrusion dection system and intrusion remedies

sequence alignment. sequences much of bioinformatics...

layered intrusions and volcanic sequences in...

effective value intrusion detection datasets intrusion...

8. intrusion detection sensors - · pdf fileoverview...

comparison of intrusion detection systems/intrusion

arithmetic sequences, geometric sequences, & scatterplots

intrusion detection - arun hodigere. intrusion and intrusion...

analysis sequences and bounded sequences

intrusion prevention, detection & response. ids vs ips ids =...

recursive sequences vs. arithmetic sequences

an intrusion detection model based upon intrusion detection