a hybrid anomaly detection model using g-lda

15
A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Upload: shubham-saini

Post on 15-Apr-2017

1.508 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: A hybrid anomaly detection model using G-LDA

A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai

Page 2: A hybrid anomaly detection model using G-LDA

Typical IDSData Collection

Data Pre-Processin

g

Intrusion Identification

Response

This work mainly focused on IntrusionIdentification

Page 3: A hybrid anomaly detection model using G-LDA

Architecture

Page 4: A hybrid anomaly detection model using G-LDA

Attribute Selection“With more data, the simpler solution

can be more accurate than the sophisticated solution.”

Selection process based on means and modes of numeric attributes

A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes

Page 5: A hybrid anomaly detection model using G-LDA

Selected AttributesSelected Attributeslogged_inSerror_ratesrv_serror_rateSame_srv_ratediff_srv_ratedst_host_serror_ratedst_host_srv_serror_rate

A strong contrast between the trends of a selected and discarded attribute visible

Page 6: A hybrid anomaly detection model using G-LDA

Training Set Selection (using LDA)Latent Dirichlet Allocation is a

generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.

Page 7: A hybrid anomaly detection model using G-LDA

Sample LDA OutputTopic 0th:0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly

Topic 1th:0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly………………

Page 8: A hybrid anomaly detection model using G-LDA

Genetic Algorithm

Page 9: A hybrid anomaly detection model using G-LDA

Genetic AlgorithmApplied on Normal and Anomaly

packets separatelyThreshold value taken for

providing a negative weightRun for 3 generationsTop 3 values for anomaly and

normal packets used

Page 10: A hybrid anomaly detection model using G-LDA

Identifying nature of incoming packetFor each selected attribute value Fi in incoming

packet◦ If Fi ∈ Vi

Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)

◦ Else Si= 0

C = Σ Si If C > 0

◦ Then AnomalyElse Normal

Page 11: A hybrid anomaly detection model using G-LDA

Additional WeightMultiplied to the anomaly

frequencyWhy ?

generic anomalies having diverse values unlike the normal packets that contain

values in a particular range• Trade-off between the accuracy andthe false positive rate required

Page 12: A hybrid anomaly detection model using G-LDA

Additional Weight

Page 13: A hybrid anomaly detection model using G-LDA

ResultsTested against 50000 anomaly

and 50000 normal packets from KDDCup’99 dataset.

88.5% Accuracy with 6% FPR

Page 14: A hybrid anomaly detection model using G-LDA

Future WorkFocus on specific anomaly typesBetter Attribute Selection

algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest

Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees

Page 15: A hybrid anomaly detection model using G-LDA

QUESTIONS?