effective anomaly detection with scarce training data presenter: 葉倚任 author: w. robertson, f....

Effective Anomaly Detection with Scarce Training DataPresenter: 葉倚任Author: W. Robertson, F. Maggi, C. Kruegel and G. VignaNDSS 2010

1

Outline

• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation

2

Properties of Anomaly Detection

• Pros– Unknown attacks can be identified automatically– Without any a priori knowledge about the application.– Need not manually analyze applications composed of

hundreds of components

• Cons– Tendency to produce a non-negligible amount of false

positives– Critically rely upon the quality of enough training data

used to construct their models

3

Motivation

• Web application component invocations are non-uniformly distributed

• For those components, it is often impossible to gather enough training data to accurately model their normal behavior

• No proposals exist that satisfactorily address the problem

4

Contributions

• Provide evidence for that traffic is distributed in a non-uniform fashion

• Propose an approach to address the problem of undertraining by using global knowledge

• Evaluate the proposed approach on a large data set of real-world traffic from many web applications

5

Outline


6

Summary of Notation

• Notations– A: a set of web applications

• – R: a set of resource paths or components

– P: parameters•

– Q: requests• Each request is represented by the tuple

7

Summary of Notation (cont’d)

• The set of models associated with each unique parameter instance can be represented as a tuple:

• The knowledgebase of an anomaly detection system trained on web application is denoted by

8

Multi-model Approach

• A profile for a given parameter is the tuple

– describe normal intervals for integers and string lengths

– models character strings as a ranked frequency histogram, or Idealized Character Distribution (ICD),

– models sets of character strings by inducing a Hidden Markov Model (HMM).

– models parameter values as a set of legal tokens

9

The Problem Non-uniform training data• In the case of low-traffic applications

– the rate of client requests is inadequate to allow models to train in a timely manner.

• In the case of high-traffic applications– a large subset of resource paths might fail to receive

enough requests

10

Non-uniform training data

11

Outline


12

Exploiting Global Knowledge

• Parameters of the same type tend to induce model compositions that are similar to each other

• The goal is substituting profiles for similar parameters of the same type

• The proposed method is composed of three phases– Enhanced training– Building profile knowledge bases– Mapping undertrained profiles to well-trained profiles

13

Phase I: Enhanced training

• Generate undertrained profiles– Let denote a sequence of client

requests containing parameter p for ai

– Randomly sampled κ-sequences, where κ can take values in

• Each of the resulting profiles is then added to a knowledge base

• Each model monitors its stability during the training phase

• Well trained, or stable, profile is stored in a knowledge base

15

Phase II: Building profile knowledge bases

• Merge a set of knowledge bases as the undertrained profile database

• Profile clustering is performed in in order to time-optimize query execution

• The resulting clusters of profiles in are denoted by

• An agglomerative hierarchical clustering algorithm using group average linkage was applied

16

Distance Measure

• More formally, the distance between the profiles ci and cj is defined as:

where is the

distance function

17

Distance Functions

18

Phase III: Mapping undertrained profilesto well-trained profiles

• The mapping is implemented as follows– A nearest-neighbor match is performed between

and – A nearest-neighbor match is performed between and the

members of to discover the undertrained profile at minimum distance from

– Well-trained profile is substituted for

19

Mapping Quality

20

Mapping Quality

• Let be a mapping from an undertrained cluster to the maximum number of elements in that cluster that map to the same cluster in C

• The robustness metric ρ is then defined as

• And

where is a minimum robustness threshold

21

Outline


22

Experimental Setting

• HTTP connection observed over a period of approximately three months

• A portion of the resulting flows were then filtered using Snort to remove known attacks

• The data set contains 823 distinct web applications, 36,392 unique components, 16,671 unique parameters, and 58,734,624 HTTP requests

23

Profile clustering quality

24

Profile mapping robustness

25

Detection accuracy•100,000 attacks

26

Conclusion

• Have identified that non-uniform web client access distributions cause model undertraining

• Propose the use of global knowledge bases of well-trained profiles to remediate a local scarcity of training data

27

effective anomaly detection with scarce training data presenter: 葉倚任 author: w. robertson, f....

Documents