a hybrid anomaly detection model using g-lda

Post on 15-Apr-2017

1.508 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai

Typical IDSData Collection

Data Pre-Processin

g

Intrusion Identification

Response

This work mainly focused on IntrusionIdentification

Architecture

Attribute Selection“With more data, the simpler solution

can be more accurate than the sophisticated solution.”

Selection process based on means and modes of numeric attributes

A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes

Selected AttributesSelected Attributeslogged_inSerror_ratesrv_serror_rateSame_srv_ratediff_srv_ratedst_host_serror_ratedst_host_srv_serror_rate

A strong contrast between the trends of a selected and discarded attribute visible

Training Set Selection (using LDA)Latent Dirichlet Allocation is a

generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.

Sample LDA OutputTopic 0th:0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly

Topic 1th:0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly………………

Genetic Algorithm

Genetic AlgorithmApplied on Normal and Anomaly

packets separatelyThreshold value taken for

providing a negative weightRun for 3 generationsTop 3 values for anomaly and

normal packets used

Identifying nature of incoming packetFor each selected attribute value Fi in incoming

packet◦ If Fi ∈ Vi

Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)

◦ Else Si= 0

C = Σ Si If C > 0

◦ Then AnomalyElse Normal

Additional WeightMultiplied to the anomaly

frequencyWhy ?

generic anomalies having diverse values unlike the normal packets that contain

values in a particular range• Trade-off between the accuracy andthe false positive rate required

Additional Weight

ResultsTested against 50000 anomaly

and 50000 normal packets from KDDCup’99 dataset.

88.5% Accuracy with 6% FPR

Future WorkFocus on specific anomaly typesBetter Attribute Selection

algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest

Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees

QUESTIONS?

top related