Download - A hybrid anomaly detection model using G-LDA
![Page 1: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/1.jpg)
A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai
![Page 2: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/2.jpg)
Typical IDSData Collection
Data Pre-Processin
g
Intrusion Identification
Response
This work mainly focused on IntrusionIdentification
![Page 3: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/3.jpg)
Architecture
![Page 4: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/4.jpg)
Attribute Selection“With more data, the simpler solution
can be more accurate than the sophisticated solution.”
Selection process based on means and modes of numeric attributes
A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes
![Page 5: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/5.jpg)
Selected AttributesSelected Attributeslogged_inSerror_ratesrv_serror_rateSame_srv_ratediff_srv_ratedst_host_serror_ratedst_host_srv_serror_rate
A strong contrast between the trends of a selected and discarded attribute visible
![Page 6: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/6.jpg)
Training Set Selection (using LDA)Latent Dirichlet Allocation is a
generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.
![Page 7: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/7.jpg)
Sample LDA OutputTopic 0th:0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly
Topic 1th:0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly………………
![Page 8: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/8.jpg)
Genetic Algorithm
![Page 9: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/9.jpg)
Genetic AlgorithmApplied on Normal and Anomaly
packets separatelyThreshold value taken for
providing a negative weightRun for 3 generationsTop 3 values for anomaly and
normal packets used
![Page 10: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/10.jpg)
Identifying nature of incoming packetFor each selected attribute value Fi in incoming
packet◦ If Fi ∈ Vi
Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)
◦ Else Si= 0
C = Σ Si If C > 0
◦ Then AnomalyElse Normal
![Page 11: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/11.jpg)
Additional WeightMultiplied to the anomaly
frequencyWhy ?
generic anomalies having diverse values unlike the normal packets that contain
values in a particular range• Trade-off between the accuracy andthe false positive rate required
![Page 12: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/12.jpg)
Additional Weight
![Page 13: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/13.jpg)
ResultsTested against 50000 anomaly
and 50000 normal packets from KDDCup’99 dataset.
88.5% Accuracy with 6% FPR
![Page 14: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/14.jpg)
Future WorkFocus on specific anomaly typesBetter Attribute Selection
algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest
Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees
![Page 15: A hybrid anomaly detection model using G-LDA](https://reader036.vdocument.in/reader036/viewer/2022082906/58f1f5211a28aba6768b4593/html5/thumbnails/15.jpg)
QUESTIONS?