predicting organization attacks via mining crowdsourcing data · 2018-09-12 · predicting attacks...
Post on 16-Jul-2020
7 Views
Preview:
TRANSCRIPT
Predicting Organization Attacks via Mining Crowdsourcing Data
Neil Gong and Ratnesh Kumar
Iowa State University
Attack Prediction: Introduction, Motivation & Goal
• Vulnerability: a certain software bug– Enlisted in public vulnerability databases, e.g., CVE
• Exploit: a code that leverages a vulnerability– Not every vulnerability is exploitable
• Motivation: – Accurate and early on-time prediction enables preventive
actions before attacks
• Goal:– Predict whether a vulnerability is exploitable– Predict the exploit-time (day, week, month) of exploitable
vulnerability
Predicting attacksPredicting exploitable
Vulnerability andExpected Exploit Time
Existing Works
• Heuristics based approaches– FIRST’s Common Vulnerability Scoring System– Microsoft’s exploitability index– Adobe’s priority ratings
• Machine learning based approaches– Leverage either public vulnerability database OR social
media data stream, but not BOTH– Rely on conventional machine learning classifier, e.g., SVM
• Limitations:– inaccurate (many false +ve/-ve)– insecure to fake social-media data
Our Proposed Work
• Leverage both public vulnerability databases AND social-media data
• Detect and filter fake social-media data
• Leverage deep learning based classifier (as opposed to Support Vector Machines) for accuracy
Our Framework – Learning Phase
Vulnerabilities from CVE
Twitter Fake-data filter
Vulnerabilityrelatedtweets
Feature extractor
Deeplearningengine
Classifier forexploitability
prediction
Classifier forexploit-timeprediction
Groundtruth from multiple sources
Features from CVE
• Bag-of-words features from the text of a vunerability
An example vulnerability in CVE
Features from Twitter• Tweets about a vulnerability with ID CVE-2016-3298
• Features:⁻ Bag-of-words features from tweets⁻ # users tweet the vulnerability⁻ #retweets⁻ …..
• Each vulnerability represented as a vector
– Followed by feature selection/dimension reduction
Social Graph based Fake User Detection
Normal Fake
?
??
?
?
??
?
?
Known normal users Known fake users
Sparse connections
• Detect and filter fake users using graph analytics
– Key observation: normal users are unlikely to connect to fake users
Groundtruth from Symantec
• Historical info about whether a vulnerability is exploited and when
Deep Learning Classifier for Exploitability and Exploit time Prediction
• Deep learning is found superior for many machine learning tasks
• Input layer receives feature vector for each vulnerability
• Network weights are adjusted so outputs match corresponding groundtruths
Our Framework – Prediction Phase
Onevulnerability
from CVE
Twitter Fake-data filter
Feature extractor
Classifier forexploitability
prediction
Classifier forexploit-timeprediction
Tweet about the vulnerability
Exploitable?When?
Summary: Major Tasks and Schedule• Task 1 (0-3 months): Collect vulnerabilities from
CVE, related tweets from Twitter, and groundtruth from Symantec
• Task 2 (3-6 months): Design and evaluate a method to detect fake users in Twitter
• Task 2 (6-9 months): Design predictive features and train deep learning classifiers
• Task 4 (9-12 months): Evaluate and refine the fake user filter and the classifiers
Selected LiteratureFeature Selection• Guyon, Isabelle, and André Elisseeff. "An introduction to variable and feature
selection." Journal of machine learning research 3.Mar (2003): 1157-1182• Information gain for feature selection.
https://en.wikipedia.org/wiki/Information_gain_in_decision_trees• Michalski, Ryszard S., Jaime G. Carbonell, and Tom M. Mitchell, eds. Machine
learning: An artificial intelligence approach. Springer Science & Business Media, 2013.
• PCA.https://en.wikipedia.org/wiki/Principal_component_analysis
Learning Deep Neural Networks• Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning
algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554.• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification
with deep convolutional neural networks." Advances in neural information processing systems. 2012.
• TensorFlow. https://www.tensorflow.org/
Fake-User Detection• Neil Zhenqiang Gong, Mario Frank, Prateek Mittal. “SybilBelief: A Semi-
supervised Learning Approach for Structure-based Sybil Detection”. In IEEE Transactions on Information Forensics and Security (TIFS), 9(6), 2014.
top related