pallabi parveen, nate mcdaniel, varun s. hariharan, bhavani thuraisingham and latifur khan...
TRANSCRIPT
Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur KhanDepartment of Computer Science atThe University of Texas at Dallas
Insider ThreatLZW & Quantized DictionaryConcept DriftExperiments & Results
An Insider is someone who
exploits, or has the intention
to exploit, his/her
legitimate access to assets
for unauthorised purposes.
For example, over time, legitimate users may enter commands that read or write private data, or install malicious software
Computer Crime and Security Survey 2001
$377 million financial losses due to attacks
49% reported incidents of unauthorized network access by insiders
WikiLeaks Breach Highlights Insider Security Threat--Even the toughest security systems sometimes have a soft center that can be exploited by someone who has passed rigorous screening
http://www.scientificamerican.com/article.cfm?id=wikileaks-insider-threat
Reduce false alarm rate without sacrificing threat detection rate
Threat detection is challenging since insiders mask and adapt their behavior to resemble legitimate system.
Normal users have a repetitive sequence of commands, system calls etc..
A sudden deviation from normal behavior, raises an alarm indicating an insider threat
To find an insider threatWe need to collect these repeated sequences of commands in an unsupervised fashionFirst challenge: variability in sequence length Overcome: Generating a LZW dictionary with combinations of possible potential patterns in the gathered data using Lempel- Ziv- Welch algorithm (LZW)Second Challenge: Huge size of the Dictionary Overcome: Compress the Dictionary
Using an ensemble of models increases the accuracy of threat anomaly detection
New data chunks create new models Problem: Ensemble holds K models and there
are K+1 Solution: Remove the least accurate model
Majority voting by all models used to determine the model that is performing the worst
Indexed the
system calls with Unicode
Anomaly?
jSystem call/command
System Call/Comman
d
Chunki+1
Chunki
System log
Testing on Data from
weeki+1
Online learning
Gather Data from
Chunki
Indexed the
system calls with Unicode
Unsupervised Sequence
Learning
Compressed the Dictionary
(QD)
Generate a LZW
Dictionary (D) containing all
possible patterns using
Lempel-Ziv-welch
Algorithm
Incremental based Stream Mining
Update the previous QD
Update models
liftliftlifliftliftliftliftliftliftliftliftliftliftlift
lift
LZW Dictionary
Quantized Dictionary
Lossy compression
Unlabeled data stream
LZW
li lif liftIf Ift Iftlft ftl ftlitl tli tlif
LZW Dictionary
OLD Quantized Dictionary (OQD)
LZWDictionary
Session 1 Session 2
Session n
LZW
LZW
LZW
New Quantized Dictionary (NQD)
compression
compression
Session 1 Session 2
Session n
LZW
LZW
LZW
Given data test stream S and quantized dictionary QD = {qd1, qd2, …},
An anomaly is a phrase/pattern in the stream which is more than α edit distance from all the patterns in QD
Steps in identifying non-matching phrases Compute edit distance matrix L for each
phrase in dictionary and data stream S If the edit distance is within α edit distance ,
delete the matching part from the stream S Remaining patterns in the stream S is
considered as anomaly
User command patterns shift over time i.e. programmer slowly evolves into an
advanced programmer Changes in users’ habits should not be
identified as anomalies Attribute natural changes to concept
drift Concept drift can be added artificially and
anomalies are still detected
drift = [.7071, 1.1180, 1.5811, 1.5811, 1.5811]
Min/Max distributions = [.42929/.57071, .08820/.31180, 0/.25811, 0/.25811, 0/.25811]
Modified Naïve Bayes that uses incremental approach(NB-INC)*
Unsupervised ensemble approach (USSL-GG) that incrementally tests for anomalies and best performs with an ensemble size of 3
(*) R. A. Maxion, “Masquerade detection using enriched command lines,”in Proc. IEEE International Conference on Dependable Systems &Networks (DSN), 2003, pp. 5–14.
Ensemble based stream mining effectively detects insider threats while coping with evolving concept drift
Our approach adopts advantages from stream mining, compression and ensembles– Compression gives unsupervised learning Stream mining offered adaptive learning Ensembles increase accuracy with concept
drift
Approach
Un/Supervised
Drift
Insider Threat
Sequence
Ju S N Y Y
Maxion S N Y N
Liu U N Y Y
Wang S N Y N
Szymanski
S N Y Y
Masud S Y N N
Parveen U Y Y N
USSL-GG U Y Y Y
Update existing models based on user feedback
Update and refine models on ground truth when it is available