effective anomaly detection with scarce training data presenter: 葉倚任 author: w. robertson, f....

27
Effective Anomaly Detection with Scarce Training Data Presenter: 葉葉葉 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Upload: crystal-shelton

Post on 18-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

Properties of Anomaly Detection Pros – Unknown attacks can be identified automatically – Without any a priori knowledge about the application. – Need not manually analyze applications composed of hundreds of components Cons – Tendency to produce a non-negligible amount of false positives – Critically rely upon the quality of enough training data used to construct their models 3

TRANSCRIPT

Page 1: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Effective Anomaly Detection with Scarce Training DataPresenter: 葉倚任Author: W. Robertson, F. Maggi, C. Kruegel and G. VignaNDSS 2010

1

Page 2: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Outline

• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation

2

Page 3: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Properties of Anomaly Detection

• Pros– Unknown attacks can be identified automatically– Without any a priori knowledge about the application.– Need not manually analyze applications composed of

hundreds of components

• Cons– Tendency to produce a non-negligible amount of false

positives– Critically rely upon the quality of enough training data

used to construct their models

3

Page 4: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Motivation

• Web application component invocations are non-uniformly distributed

• For those components, it is often impossible to gather enough training data to accurately model their normal behavior

• No proposals exist that satisfactorily address the problem

4

Page 5: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Contributions

• Provide evidence for that traffic is distributed in a non-uniform fashion

• Propose an approach to address the problem of undertraining by using global knowledge

• Evaluate the proposed approach on a large data set of real-world traffic from many web applications

5

Page 6: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Outline

• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation

6

Page 7: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Summary of Notation

• Notations– A: a set of web applications

• – R: a set of resource paths or components

– P: parameters•

– Q: requests• Each request is represented by the tuple

7

Page 8: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Summary of Notation (cont’d)

• The set of models associated with each unique parameter instance can be represented as a tuple:

• The knowledgebase of an anomaly detection system trained on web application is denoted by

8

Page 9: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Multi-model Approach

• A profile for a given parameter is the tuple

– describe normal intervals for integers and string lengths

– models character strings as a ranked frequency histogram, or Idealized Character Distribution (ICD),

– models sets of character strings by inducing a Hidden Markov Model (HMM).

– models parameter values as a set of legal tokens

9

Page 10: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

The Problem Non-uniform training data• In the case of low-traffic applications

– the rate of client requests is inadequate to allow models to train in a timely manner.

• In the case of high-traffic applications– a large subset of resource paths might fail to receive

enough requests

10

Page 11: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Non-uniform training data

11

Page 12: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Outline

• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation

12

Page 13: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Exploiting Global Knowledge

• Parameters of the same type tend to induce model compositions that are similar to each other

• The goal is substituting profiles for similar parameters of the same type

• The proposed method is composed of three phases– Enhanced training– Building profile knowledge bases– Mapping undertrained profiles to well-trained profiles

13

Page 14: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

14

Page 15: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Phase I: Enhanced training

• Generate undertrained profiles– Let denote a sequence of client

requests containing parameter p for ai

– Randomly sampled κ-sequences, where κ can take values in

• Each of the resulting profiles is then added to a knowledge base

• Each model monitors its stability during the training phase

• Well trained, or stable, profile is stored in a knowledge base

15

Page 16: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Phase II: Building profile knowledge bases

• Merge a set of knowledge bases as the undertrained profile database

• Profile clustering is performed in in order to time-optimize query execution

• The resulting clusters of profiles in are denoted by

• An agglomerative hierarchical clustering algorithm using group average linkage was applied

16

Page 17: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Distance Measure

• More formally, the distance between the profiles ci and cj is defined as:

where is the

distance function

17

Page 18: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Distance Functions

18

Page 19: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Phase III: Mapping undertrained profilesto well-trained profiles

• The mapping is implemented as follows– A nearest-neighbor match is performed between

and – A nearest-neighbor match is performed between and the

members of to discover the undertrained profile at minimum distance from

– Well-trained profile is substituted for

19

Page 20: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Mapping Quality

20

Page 21: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Mapping Quality

• Let be a mapping from an undertrained cluster to the maximum number of elements in that cluster that map to the same cluster in C

• The robustness metric ρ is then defined as

• And

where is a minimum robustness threshold

21

Page 22: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Outline

• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation

22

Page 23: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Experimental Setting

• HTTP connection observed over a period of approximately three months

• A portion of the resulting flows were then filtered using Snort to remove known attacks

• The data set contains 823 distinct web applications, 36,392 unique components, 16,671 unique parameters, and 58,734,624 HTTP requests

23

Page 24: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Profile clustering quality

24

Page 25: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Profile mapping robustness

25

Page 26: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Detection accuracy•100,000 attacks

26

Page 27: Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

Conclusion

• Have identified that non-uniform web client access distributions cause model undertraining

• Propose the use of global knowledge bases of well-trained profiles to remediate a local scarcity of training data

27