predicting organization attacks via mining crowdsourcing data · 2018-09-12 · predicting attacks...
TRANSCRIPT
![Page 1: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/1.jpg)
Predicting Organization Attacks via Mining Crowdsourcing Data
Neil Gong and Ratnesh Kumar
Iowa State University
![Page 2: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/2.jpg)
Attack Prediction: Introduction, Motivation & Goal
• Vulnerability: a certain software bug– Enlisted in public vulnerability databases, e.g., CVE
• Exploit: a code that leverages a vulnerability– Not every vulnerability is exploitable
• Motivation: – Accurate and early on-time prediction enables preventive
actions before attacks
• Goal:– Predict whether a vulnerability is exploitable– Predict the exploit-time (day, week, month) of exploitable
vulnerability
Predicting attacksPredicting exploitable
Vulnerability andExpected Exploit Time
![Page 3: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/3.jpg)
Existing Works
• Heuristics based approaches– FIRST’s Common Vulnerability Scoring System– Microsoft’s exploitability index– Adobe’s priority ratings
• Machine learning based approaches– Leverage either public vulnerability database OR social
media data stream, but not BOTH– Rely on conventional machine learning classifier, e.g., SVM
• Limitations:– inaccurate (many false +ve/-ve)– insecure to fake social-media data
![Page 4: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/4.jpg)
Our Proposed Work
• Leverage both public vulnerability databases AND social-media data
• Detect and filter fake social-media data
• Leverage deep learning based classifier (as opposed to Support Vector Machines) for accuracy
![Page 5: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/5.jpg)
Our Framework – Learning Phase
Vulnerabilities from CVE
Twitter Fake-data filter
Vulnerabilityrelatedtweets
Feature extractor
Deeplearningengine
Classifier forexploitability
prediction
Classifier forexploit-timeprediction
Groundtruth from multiple sources
![Page 6: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/6.jpg)
Features from CVE
• Bag-of-words features from the text of a vunerability
An example vulnerability in CVE
![Page 7: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/7.jpg)
Features from Twitter• Tweets about a vulnerability with ID CVE-2016-3298
• Features:⁻ Bag-of-words features from tweets⁻ # users tweet the vulnerability⁻ #retweets⁻ …..
• Each vulnerability represented as a vector
– Followed by feature selection/dimension reduction
![Page 8: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/8.jpg)
Social Graph based Fake User Detection
Normal Fake
?
??
?
?
??
?
?
Known normal users Known fake users
Sparse connections
• Detect and filter fake users using graph analytics
– Key observation: normal users are unlikely to connect to fake users
![Page 9: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/9.jpg)
Groundtruth from Symantec
• Historical info about whether a vulnerability is exploited and when
![Page 10: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/10.jpg)
Deep Learning Classifier for Exploitability and Exploit time Prediction
• Deep learning is found superior for many machine learning tasks
• Input layer receives feature vector for each vulnerability
• Network weights are adjusted so outputs match corresponding groundtruths
![Page 11: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/11.jpg)
Our Framework – Prediction Phase
Onevulnerability
from CVE
Twitter Fake-data filter
Feature extractor
Classifier forexploitability
prediction
Classifier forexploit-timeprediction
Tweet about the vulnerability
Exploitable?When?
![Page 12: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/12.jpg)
Summary: Major Tasks and Schedule• Task 1 (0-3 months): Collect vulnerabilities from
CVE, related tweets from Twitter, and groundtruth from Symantec
• Task 2 (3-6 months): Design and evaluate a method to detect fake users in Twitter
• Task 2 (6-9 months): Design predictive features and train deep learning classifiers
• Task 4 (9-12 months): Evaluate and refine the fake user filter and the classifiers
![Page 13: Predicting Organization Attacks via Mining Crowdsourcing Data · 2018-09-12 · Predicting attacks Predicting exploitable Vulnerability and Expected Exploit Time. Existing Works •Heuristics](https://reader034.vdocument.in/reader034/viewer/2022042621/5f4f71cb505e10648e6be450/html5/thumbnails/13.jpg)
Selected LiteratureFeature Selection• Guyon, Isabelle, and André Elisseeff. "An introduction to variable and feature
selection." Journal of machine learning research 3.Mar (2003): 1157-1182• Information gain for feature selection.
https://en.wikipedia.org/wiki/Information_gain_in_decision_trees• Michalski, Ryszard S., Jaime G. Carbonell, and Tom M. Mitchell, eds. Machine
learning: An artificial intelligence approach. Springer Science & Business Media, 2013.
• PCA.https://en.wikipedia.org/wiki/Principal_component_analysis
Learning Deep Neural Networks• Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning
algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554.• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification
with deep convolutional neural networks." Advances in neural information processing systems. 2012.
• TensorFlow. https://www.tensorflow.org/
Fake-User Detection• Neil Zhenqiang Gong, Mario Frank, Prateek Mittal. “SybilBelief: A Semi-
supervised Learning Approach for Structure-based Sybil Detection”. In IEEE Transactions on Information Forensics and Security (TIFS), 9(6), 2014.