evolving insider threat detection
DESCRIPTION
Evolving Insider Threat Detection. Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Lecture 28 August 2013. Evolving Insider threat Detection Unsupervised Learning Supervised learning. Outline. Evolving Insider Threat Detection. - PowerPoint PPT PresentationTRANSCRIPT
Evolving Insider Threat Evolving Insider Threat DetectionDetection
Pallabi Parveen
Dr. Bhavani Thuraisingham (Advisor)
Dept of Computer ScienceUniversity of Texas at Dallas
Lecture 28
August 2013
OutlineOutline
Evolving Insider threat Detection
Unsupervised Learning
Supervised learning
Feature Extraction
& Selection
Anomaly?
jSystem traces System
Traces
weeki+1weeki
Online learning
Gather Data fromWeeki
Feature Extractio
n&
Selection
Learning algorith
m
Supervised - One class
SVM, OCSVM
Unsupervised - Graph based
Anomaly detection, GBAD
Ensemble based Stream Mining
Ensemble of Models
Update models
Evolving Insider Threat DetectionSystem log
Testing on Data from
weeki+1
Insider Threat Detection Insider Threat Detection using unsupervised using unsupervised
Learning based on GraphLearning based on Graph
Outlines: Unsupervised LearningOutlines: Unsupervised Learning
Insider ThreatRelated WorkProposed MethodExperiments & Results
Definition of an Insider
An Insider is someone who exploits, or has the intention to exploit, their legitimate access to assets for unauthorised purposes
Insider Threat is a real threatInsider Threat is a real threatComputer Crime and Security Survey
2001$377 million financial losses due to
attacks
49% reported incidents of unauthorized network access by insiders
Insider Threat : ContinueInsider Threat : Continue
Insider threat◦Detection◦Prevention
Detection based approach: ◦Unsupervised learning, Graph Based Anomaly
Detection◦Ensembles based Stream Mining
Related workRelated work"Intrusion Detection Using Sequences of System
Calls," Supervised learning by Hofmeyr
"Mining for Structural Anomalies in Graph-Based Data Representations (GBAD) for Insider Threat Detection." Unsupervised learning by Staniford-Chen and Lawrence Holder
All are static in nature. Cannot learn from evolving Data stream
Related Approaches and comparison with Related Approaches and comparison with proposed solutions proposed solutions
Techniques Proposed By
Challenges
Supervised/
UnsupervisedConcept-drift Insider Threat Graph-based
Forrest, Hofmeyr Supervised X √ X
Masud , Fan (Stream Mining) Supervised √ N/A N/A
Liu Unsupervised X √ X
Holder (GBAD) Unsupervised X √ √
Our Approach (EIT) Unsupervised √ √ √
Why Unsupervised Learning?Why Unsupervised Learning?
One approach to detecting insider threat is supervised learning where models are built from training data.
Approximately .03% of the training data is associated with insider threats (minority class)
While 99.97% of the training data is associated with non insider threat (majority class).
Unsupervised learning is an alternative for this.
Why Stream MiningWhy Stream MiningAll are static in nature. Cannot learn
from evolving Data stream
Data ChunkPrevious decision boundary
Current decision boundary
Data Stream
Anomaly Data Normal Data Instances victim of concept drift
Proposed MethodProposed Method
Graph based anomaly detection (GBAD, Unsupervised learning) [2]Graph based anomaly detection (GBAD, Unsupervised learning) [2]
Ensemble based Stream Mining
GBAD ApproachGBAD Approach
Determine normative pattern S using SUBDUE minimum description length (MDL) heuristic that minimizes:
M(S,G) = DL(G|S) + DL(S)
Unsupervised Pattern DiscoveryUnsupervised Pattern DiscoveryGraph compression and theminimum description length (MDL)principle
The best graphical pattern S minimizes the description length of S and the description length of the graph G compressed with pattern S
where description length DL(S) is the minimum number of bits needed to represent S (SUBDUE)
Compression can be based on inexact matches to pattern
))|()((min SGDLSDLS
S1
S1
S1
S1
S1 S2
S2 S2
Three types of anomaliesThree types of anomaliesThree algorithms for handling each of the
different anomaly categories using Graph compression and the minimum description length (MDL) principle:
1. GBAD-MDL finds anomalous modifications
2. GBAD-P (Probability) finds anomalous insertions
3. GBAD-MPS (Maximum Partial Substructure) finds anomalous deletions
Example of graph with normative pattern Example of graph with normative pattern and different types of anomaliesand different types of anomalies
A B
C D
G
A B
C D
A B
E D
A B
C D
A B
C D
GBAD-MDL (modification)
GBAD-P (insertion)
GBAD-MPS (Deletion)
G CG G
NormativeStructure
Proposed MethodProposed Method
Graph based anomaly detection (GBAD, Unsupervised learning) Graph based anomaly detection (GBAD, Unsupervised learning)
Ensemble based Stream Mining
Characteristics of Data StreamCharacteristics of Data Stream
◦ Continuous flow of
data
Network traffic
Sensor data Call center
records
◦ Examples:
DataStream ClassificationDataStream Classification
Ensemble of ClassifiersEnsemble of Classifiers
C1
C2
C3
x,?
+
+
-input
ClassifierIndividual outputs
voting
+
Ensemble output
Proposed Ensemble based Insider Proposed Ensemble based Insider Threat Detection (EIT)Threat Detection (EIT)
Maintain K GBAD models◦q normative patterns
Majority VotingUpdated Ensembles
◦Always maintain K models◦Drop least accurate model
Ensemble based Classification of Data Ensemble based Classification of Data Streams Streams (unsupervised Learning--(unsupervised Learning--
GBAD)GBAD)◦ Build a model (with q normative patterns) from each
data chunk◦ Keep the best K such model-ensemble◦ Example: for K = 3
Data chunks
Model with Normative Patterns
D1
C1
D2
C2
D3
C3
Ensemble
C1 C2 C3
D4
Prediction
D4
C4C4
C4
D5D5
C5C5
C5
D6
Update EnsembleTesting chunk
EIT –U pseudocodeEIT –U pseudocode Ensemble (Ensemble A, test Graph t, Chunk S)
LABEL/TEST THE NEW MODEL 1: Compute new model with q normative Substructure using GBAD from S 2: Add new model to A 3: For each model M in A 4: For each Class/ normative substructure, q in M 5: Results1 Run GBAD-P with test Graph t & q 6: Results2 Run GBAD-MDL with test Graph t & q 7: Result3 Run GBAD-MPS with test Graph t & q 8: Anomalies Parse Results (Results1, Results2, Results3) End For End For
9: For each anomaly N in Anomalies 10: If greater than half of the models agree 11: Agreed Anomalies N
12: Add 1 to incorrect values of the disagreeing models 13: Add 1 to correct values of the agreeing models End For
UPDATE THE ENSEMBLE: 14: Remove model with lowest (correct/(correct + incorrect)) ratio End Ensemble
ExperimentsExperiments
1998 MIT Lincoln Laboratory500,000+ vertices K =1,3,5,7,9 Modelsq= 5 Normative substructures per model/
Chunk9 weeks
◦Each chunk covers 1 week
A Sample system call record from A Sample system call record from MIT Lincoln DatasetMIT Lincoln Dataset
Token Sub-graphToken Sub-graph
Token
<Process ID>
<Call>
<Arguments>
<User Audit ID>
<Date>
<Return Value><Path>
<Terminal>
# ofModels
Total False Positives/Negative
True Positives False PositivesFalse
NegativesNormalGBAD
9 920 0
K=3 9 188 0
K=5 9 180 0
K=7 9 179 0
K=9 9 150 0
Total Ensemble Accuracy Total Ensemble Accuracy
PerformancePerformance
Performance Contd..Performance Contd..
0 false negativesSignificant decrease in false positivesNumber of Model increases
◦False positive decreases slowly after k=3
Performance Contd..Performance Contd..
Distribution of False PositivesDistribution of False Positives
Entry Description—Dataset A
Description—Dataset B
User Donaldh William# of vertices 269 1283
# of Edges 556 469Week 2-8 4-7Day Friday Thursday
Summary of Dataset A & BSummary of Dataset A & B
Performance Contd..Performance Contd..
Performance Contd..Performance Contd..
The effect of q on TP rates for fixed K = 6 on dataset A
The effect of q on TP rates for fixed K = 6 on dataset A
The effect of q on FP rates for fixed K = 6 on dataset A
The effect of q on FP rates for fixed K = 6 on dataset A
The effect of q on runtime For fixed K = 6 on Dataset A
The effect of q on runtime For fixed K = 6 on Dataset A
True Positive vs # normative substructure for fixed K=6 on dataset ATrue Positive vs # normative substructure for fixed K=6 on dataset A
Performance Contd..Performance Contd..
The effect of K on TP rates for fixed q = 4 on dataset AThe effect of K on TP rates for fixed q = 4 on dataset A
The effect of K on runtime for fixed q = 4 on Dataset AThe effect of K on runtime for fixed q = 4 on Dataset A
Evolving Insider Threat Evolving Insider Threat Detection using Detection using
Supervised LearningSupervised Learning
Outlines: Supervised LearningOutlines: Supervised Learning
Related WorkProposed MethodExperiments & Results
Related Approaches and comparison with Related Approaches and comparison with proposed solutions proposed solutions
Techniques Proposed By
Challenges
Supervised/UnsupervisedConcept-
driftInsider Threat Graph-based
Liu Unsupervised X √ X
Holder (GBAD) Unsupervised X √ √
Masud , Fan (Stream Mining) Supervised √ N/A N/A
Forrest, Hofmeyr Supervised X √ X
Our Approach (EIT-U) Unsupervised √ √ √
Our Approach (EIT-S) Supervised √ √ X
Why one class SVMWhy one class SVM Insider threat data is minority class
Traditional support vector machines (SVM) trained from such an imbalanced dataset are likely to perform poorly on test datasets specially on minority class
One-class SVMs (OCSVM) addresses the rare-class issue by building a model that considers only normal data (i.e., non-threat data).
During the testing phase, test data is classified as normal or anomalous based on geometric deviations from the model.
Proposed MethodProposed Method
One class SVM (OCSVM) , Supervised learning
One class SVM (OCSVM) , Supervised learning
Ensemble based Stream Mining
One class SVM (OCSVM)One class SVM (OCSVM) Maps training data into a high dimensional feature space (via a
kernel). Then iteratively finds the maximal margin hyper plane which best
separates the training data from the origin corresponds to the classification rule:
For testing, f(x) < 0. we label x as an anomaly, otherwise as normal data
Proposed Ensemble based Insider Proposed Ensemble based Insider Threat Detection (EIT)Threat Detection (EIT)
Maintain K number of OCSVM (One class SVM) models
Majority VotingUpdated Ensemble
◦Always maintain K models◦Drop least accurate model
Ensemble based Classification of Data Ensemble based Classification of Data StreaStreams ms (supervised Learning)(supervised Learning)
Divide the data stream into equal sized chunks◦ Train a classifier from each data chunk◦ Keep the best K OCSVM classifier-ensemble◦ Example: for K= 3
Data chunks
Classifiers
D1
C1
D2
C2
D3
C3
Ensemble
C1 C2 C3
D4
Prediction
D4
C4C4
C4
D5D5
C5C5
C5
D6
Labeled chunkUnlabeled chunk
Addresses infinite lengthand concept-drift
EIT –S pseudo code (Testing)EIT –S pseudo code (Testing)
Algorithm 1 Testing
Input: A← Build-initial-ensemble()Du← latest chunk of unlabeled instancesOutput: Prediction/Label of Du
1: Fu Extract&Select-Features(Du) //Feature set for Du 2: for each xj∈ Fu do 3. ResultsNULL 4. for each model M in A 5. Results Results U Prediction (xj, M) end for 6. Anomalies Majority Voting (Results) end for
EIT –S pseudocodeEIT –S pseudocode
Algorithm 2 Updating the classifierensemble
Input: Dn: the most recently labeled data chunks,A: the current ensemble of best K classifiersOutput: an updated ensemble A
1: for each model M ∈ A do 2: Test M on Dn and compute its expected error 3: end for 4: Mn Newly trained 1-class SVM classifier (OCSVM)
from data Dn 5: Test Mn on Dn and compute its expected error 6: A best K classifiers from Mn ∪ A based on expected
error
Feature Set extractedFeature Set extracted
Time, userID, machine IP, command, argument, path, return1 1:29669 6:1 8:1 21:1 32:1 36:0
PERFORMANCE…..PERFORMANCE…..
Updating vs Non-updating stream approachUpdating vs Non-updating stream approach
Performance Contd..Performance Contd..
Updating Stream Non-updating Stream
False Positives 13774 24426True Negatives 44362 33710False Negatives 1 1True Positives 9 9Accuracy 0.76 0.58False Positive Rate 0.24 0.42False Negative Rate 0.1 0.1
Supervised Learning
Unsupervised Learning
False Positives 55 95 True Negatives
122 82
False Negatives
0 5
True Positives 12 7 Accuracy 0.71 0.56 False Positive Rate
0.31 0.54
False Negative Rate
0 0.42
Performance Contd..Performance Contd..
Supervised (EIT-S) vs. Unsupervised(EIT-U) Learning
Supervised (EIT-S) vs. Unsupervised(EIT-U) Learning
Entry Description—Dataset A
User Donaldh
# of records 189
Week 2-7 (Friday only)
Summary of Dataset ASummary of Dataset A
Conclusion & Future WorkConclusion & Future WorkConclusion: Evolving Insider threat detection using Stream Mining Unsupervised learning and supervised learning
Future Work: Misuse detection in mobile device Cloud computing for improving processing time.
PublicationPublicationConference Papers: Pallabi Parveen, Jonathan Evans, Bhavani Thuraisingham, Kevin W. Hamlen, Latifur Khan, “ InsiderThreat Detection Using Stream Mining and Graph Mining,” in Proc. of the Third IEEE InternationalConference on Information Privacy, Security, Risk and Trust (PASSAT 2011), October 2011, MIT, Boston,USA (full paper acceptance rate: 13%).
Pallabi Parveen, Zackary R Weger, Bhavani Thuraisingham, Kevin Hamlen and Latifur KhanSupervised Learning for Insider Threat Detection Using Stream Mining, to appear in 23rd IEEEInternational Conference on Tools with Artificial Intelligence (ICTAI2011), Nov. 7-9, 2011, Boca Raton, Florida, USA (acceptance rate is 30%)
Pallabi Parveen, Bhavani M. Thuraisingham: Face Recognition Using Multiple Classifiers. ICTAI 2006, 179-186Journal:Jeffrey Partyka, Pallabi Parveen, Latifur Khan, Bhavani M. Thuraisingham, Shashi Shekhar: Enhancedgeographically typed semantic schema matching. J. Web Sem. 9(1): 52-70 (2011).Others:Neda Alipanah, Pallabi Parveen, Sheetal Menezes, Latifur Khan, Steven Seida, Bhavani M.Thuraisingham: Ontology-driven query expansion methods to facilitate federated queries. SOCA 2010, 1-8Neda Alipanah, Piyush Srivastava, Pallabi Parveen, Bhavani M. Thuraisingham: Ranking Ontologies UsingVerified Entities to Facilitate Federated Queries. Web Intelligence 2010: 332-337
ReferencesReferences1. W. Eberle and L. Holder, Anomaly detection in Data
Represented as Graphs, Intelligent Data Analysis, Volume 11, Number 6, 2007. http://ailab.wsu.edu/subdue
2. W. Ling Chen, Shan Zhang, Li Tu: An Algorithm for Mining Frequent Items on Data Stream Using Fading Factor. COMPSAC(2) 2009: 172-177
3. S. A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion Detection Using Sequences of System Calls,” Journal of Computer Security, vol. 6, pp. 151-180, 1998.
4. M. Masud, J. Gao, L. Khan, J. Han, B. Thuraisingham, “A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data,” Int.Conf. on Data Mining, Pisa, Italy, December 2010.
Thank YouThank You