a hybrid anomaly detection model using g-lda
TRANSCRIPT
A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai
Typical IDSData Collection
Data Pre-Processin
g
Intrusion Identification
Response
This work mainly focused on IntrusionIdentification
Architecture
Attribute Selection“With more data, the simpler solution
can be more accurate than the sophisticated solution.”
Selection process based on means and modes of numeric attributes
A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes
Selected AttributesSelected Attributeslogged_inSerror_ratesrv_serror_rateSame_srv_ratediff_srv_ratedst_host_serror_ratedst_host_srv_serror_rate
A strong contrast between the trends of a selected and discarded attribute visible
Training Set Selection (using LDA)Latent Dirichlet Allocation is a
generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.
Sample LDA OutputTopic 0th:0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly
Topic 1th:0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly………………
Genetic Algorithm
Genetic AlgorithmApplied on Normal and Anomaly
packets separatelyThreshold value taken for
providing a negative weightRun for 3 generationsTop 3 values for anomaly and
normal packets used
Identifying nature of incoming packetFor each selected attribute value Fi in incoming
packet◦ If Fi ∈ Vi
Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)
◦ Else Si= 0
C = Σ Si If C > 0
◦ Then AnomalyElse Normal
Additional WeightMultiplied to the anomaly
frequencyWhy ?
generic anomalies having diverse values unlike the normal packets that contain
values in a particular range• Trade-off between the accuracy andthe false positive rate required
Additional Weight
ResultsTested against 50000 anomaly
and 50000 normal packets from KDDCup’99 dataset.
88.5% Accuracy with 6% FPR
Future WorkFocus on specific anomaly typesBetter Attribute Selection
algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest
Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees
QUESTIONS?