evolving insider threat detection pallabi parveen dr. bhavani thuraisingham (advisor) dept of...

51
Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by AFOSR

Upload: catherine-jordan

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Evolving Insider Threat Detection

Pallabi Parveen

Dr. Bhavani Thuraisingham (Advisor)

Dept of Computer ScienceUniversity of Texas at Dallas

Funded by AFOSR

Page 2: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Outline

Evolving Insider threat Detection

Unsupervised Learning

Supervised learning

Page 3: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Feature Extraction

& Selection

Anomaly?

jSystem traces System

Traces

weeki+1

weeki

Evolving Insider Threat DetectionSystem log

Testing on Data from

weeki+1

Online learning

Gather Data from

Weeki

Feature Extraction

& Selection

Learning algorithm

Supervised - One class

SVM, OCSVM

Unsupervised - Graph based

Anomaly detection,

GBAD

Ensemble based Stream Mining

Ensemble of Models

Update models

Page 4: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Insider Threat Detection using unsupervised

Learning based on Graph

Page 5: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Outlines: Unsupervised Learning

Insider ThreatRelated WorkProposed MethodExperiments & Results

Page 6: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Definition of an Insider

An Insider is someone

who exploits, or has the

intention to exploit, their

legitimate access to assets

for unauthorised purposes

Page 7: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Insider Threat is a real threatComputer Crime and Security Survey

2001

$377 million financial losses due to attacks

49% reported incidents of unauthorized network access by insiders

Page 8: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Insider Threat : Continue

Insider threat◦Detection◦Prevention

Detection based approach: ◦Unsupervised learning, Graph Based Anomaly

Detection◦Ensembles based Stream Mining

Page 9: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Related work"Intrusion Detection Using Sequences of System

Calls," Supervised learning by Hofmeyr

"Mining for Structural Anomalies in Graph-Based Data Representations (GBAD) for Insider Threat Detection." Unsupervised learning by Staniford-Chen and Lawrence Holder

All are static in nature. Cannot learn from evolving Data stream

Page 10: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Related Approaches and comparison with proposed solutions

Techniques Proposed By

Challenges

Supervised/

UnsupervisedConcept-drift Insider Threat Graph-based

Forrest, Hofmeyr Supervised X √ X

Masud , Fan (Stream Mining) Supervised √ N/A N/A

Liu Unsupervised X √ X

Holder (GBAD) Unsupervised X √ √

Our Approach (EIT) Unsupervised √ √ √

Page 11: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Why Unsupervised Learning?

One approach to detecting insider threat is supervised learning where models are built from training data.

Approximately .03% of the training data is associated with insider threats (minority class)

While 99.97% of the training data is associated with non insider threat (majority class).

Unsupervised learning is an alternative for this.

Page 12: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Why Stream MiningAll are static in nature. Cannot learn

from evolving Data stream

Data ChunkPrevious decision boundary

Current decision boundary

Data Stream

Anomaly Data Normal Data Instances victim of concept drift

Page 13: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Proposed Method

Graph based anomaly detection (GBAD, Unsupervised learning) [2]

Ensemble based Stream Mining

+

Page 14: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

GBAD Approach

Determine normative pattern S using SUBDUE minimum description length (MDL) heuristic that minimizes:

M(S,G) = DL(G|S) + DL(S)

Page 15: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Unsupervised Pattern DiscoveryGraph compression and theminimum description length (MDL)principle

The best graphical pattern S minimizes the description length of S and the description length of the graph G compressed with pattern S

where description length DL(S) is the minimum number of bits needed to represent S (SUBDUE)

Compression can be based on inexact matches to pattern

))|()((min SGDLSDLS

S1

S1

S1

S1

S1 S2

S2 S2

Page 16: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Three types of anomaliesThree algorithms for handling each of the

different anomaly categories using Graph compression and the minimum description length (MDL) principle:

1. GBAD-MDL finds anomalous modifications

2. GBAD-P (Probability) finds anomalous insertions

3. GBAD-MPS (Maximum Partial Substructure) finds anomalous deletions

Page 17: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Example of graph with normative pattern and different types of anomalies

A B

C D

G

A B

C D

A B

E D

A B

C D

A B

C D

GBAD-MDL (modification)

GBAD-P (insertion)

GBAD-MPS (Deletion)

G CG G

NormativeStructure

Page 18: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Proposed Method

Graph based anomaly detection (GBAD, Unsupervised learning)

Ensemble based Stream Mining

+

Page 19: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Characteristics of Data Stream

◦ Continuous flow of

data

Network traffic

Sensor data Call center

records

◦ Examples:

Page 20: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

DataStream Classification Single Model Incremental classification

Ensemble Model based classification

Ensemble based is more effective than incremental approach.

Page 21: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Ensemble of Classifiers

C1

C2

C3

x,?

+

+

-input

ClassifierIndividual outputs

voting

+

Ensemble output

Page 22: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Proposed Ensemble based Insider Threat Detection (EIT)

Maintain K GBAD models◦q normative patterns

Majority VotingUpdated Ensembles

◦Always maintain K models◦Drop least accurate model

Page 23: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Ensemble based Classification of Data Streams (unsupervised Learning--

GBAD)◦ Build a model (with q normative patterns) from each data

chunk◦ Keep the best K such model-ensemble◦ Example: for K = 3

Data chunks

Model with Normative Patterns

D1

C1

D2

C2

D3

C3

Ensemble

C1 C2 C3

D4

Prediction

D4

C4C4

C4

D5D5

C5C5

C5

D6

Update EnsembleTesting chunk

Page 24: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

EIT –U pseudocode Ensemble (Ensemble A, test Graph t, Chunk S)

LABEL/TEST THE NEW MODEL 1: Compute new model with q normative Substructure using GBAD from S 2: Add new model to A 3: For each model M in A 4: For each Class/ normative substructure, q in M 5: Results1 Run GBAD-P with test Graph t & q 6: Results2 Run GBAD-MDL with test Graph t & q 7: Result3 Run GBAD-MPS with test Graph t & q 8: Anomalies Parse Results (Results1, Results2, Results3) End For End For

9: For each anomaly N in Anomalies 10: If greater than half of the models agree 11: Agreed Anomalies N

12: Add 1 to incorrect values of the disagreeing models 13: Add 1 to correct values of the agreeing models End For

UPDATE THE ENSEMBLE: 14: Remove model with lowest (correct/(correct + incorrect)) ratio End Ensemble

Page 25: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Experiments

1998 MIT Lincoln Laboratory500,000+ vertices K =1,3,5,7,9 Modelsq= 5 Normative substructures per model/

Chunk9 weeks

◦Each chunk covers 1 week

Page 26: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

A Sample system call record from MIT Lincoln Dataset

header,150,2, execve(2),,Fri Jul 31 07:46:33 1998, +

652468777 msec

path,/usr/lib/fs/ufs/quota

attribute,104555,root,bin,8388614,187986,0

exec_args,1,

/usr/sbin/quota

subject,2110,root,rjm,2110,rjm,280,272,0-0-172.16.112.50

return,success,0

trailer,150

Page 27: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Token Sub-graph

Token

<Process ID>

<Call>

<Arguments>

<User Audit ID>

<Date>

<Return Value><Path>

<Terminal>

Page 28: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

# ofModels

Total False Positives/Negative

True Positives False PositivesFalse

NegativesNormalGBAD

9 920 0

K=3 9 188 0

K=5 9 180 0

K=7 9 179 0

K=9 9 150 0

Total Ensemble Accuracy

Performance

Page 29: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Performance Contd..

0 false negativesSignificant decrease in false positivesNumber of Model increases

◦False positive decreases slowly after k=3

Page 30: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Performance Contd..

Distribution of False Positives

Page 31: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Entry Description—Dataset A

Description—Dataset B

User Donaldh William# of vertices 269 1283

# of Edges 556 469Week 2-8 4-7Day Friday Thursday

Summary of Dataset A & B

Performance Contd..

Page 32: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Performance Contd..

The effect of q on TP rates for fixed K = 6 on dataset A

The effect of q on FP rates for fixed K = 6 on dataset A

The effect of q on runtime For fixed K = 6 on Dataset A

Page 33: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

True Positive vs # normative substructure for fixed K=6 on dataset ATrue Positive vs # normative substructure for fixed K=6 on dataset A

Performance Contd..

The effect of K on TP rates for fixed q = 4 on dataset A

The effect of K on runtime for fixed q = 4 on Dataset A

Page 34: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Evolving Insider Threat Detection using

Supervised Learning

Page 35: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Outlines: Supervised Learning

Related WorkProposed MethodExperiments & Results

Page 36: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Related Approaches and comparison with proposed solutions

Techniques Proposed By

Challenges

Supervised/UnsupervisedConcept-

driftInsider Threat Graph-based

Liu Unsupervised X √ X

Holder (GBAD) Unsupervised X √ √

Masud , Fan (Stream Mining) Supervised √ N/A N/A

Forrest, Hofmeyr Supervised X √ X

Our Approach (EIT-U) Unsupervised √ √ √

Our Approach (EIT-S) Supervised √ √ X

Page 37: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Why one class SVM Insider threat data is minority class

Traditional support vector machines (SVM) trained from such an imbalanced dataset are likely to perform poorly on test datasets specially on minority class

One-class SVMs (OCSVM) addresses the rare-class issue by building a model that considers only normal data (i.e., non-threat data).

During the testing phase, test data is classified as normal or anomalous based on geometric deviations from the model.

Page 38: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Proposed Method

One class SVM (OCSVM) , Supervised learning

Ensemble based Stream Mining

+

Page 39: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

One class SVM (OCSVM) Maps training data into a high dimensional feature space (via a

kernel). Then iteratively finds the maximal margin hyper plane which best

separates the training data from the origin corresponds to the classification rule:

For testing, f(x) < 0. we label x as an anomaly, otherwise as normal data

f(X) = <w,x> + bwhere w is the normal vector and b is a bias term

Page 40: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Proposed Ensemble based Insider Threat Detection (EIT)

Maintain K number of OCSVM (One class SVM) models

Majority VotingUpdated Ensemble

◦Always maintain K models◦Drop least accurate model

Page 41: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Ensemble based Classification of Data Streams (supervised Learning)

Divide the data stream into equal sized chunks◦ Train a classifier from each data chunk◦ Keep the best K OCSVM classifier-ensemble◦ Example: for K= 3

Data chunks

Classifiers

D1

C1

D2

C2

D3

C3

Ensemble

C1 C2 C3

D4

Prediction

D4

C4C4

C4

D5D5

C5C5

C5

D6

Labeled chunkUnlabeled chunk

Addresses infinite lengthand concept-drift

Page 42: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

EIT –S pseudo code (Testing)

Algorithm 1 Testing

Input: A← Build-initial-ensemble()Du← latest chunk of unlabeled instances

Output: Prediction/Label of Du

1: Fu Extract&Select-Features(Du) //Feature set for Du 2: for each xj∈ Fu do 3. ResultsNULL 4. for each model M in A 5. Results Results U Prediction (xj, M) end for 6. Anomalies Majority Voting (Results) end for

Page 43: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

EIT –S pseudocode

Algorithm 2 Updating the classifierensemble

Input: Dn: the most recently labeled data chunks,A: the current ensemble of best K classifiersOutput: an updated ensemble A

1: for each model M ∈ A do 2: Test M on Dn and compute its expected error 3: end for 4: Mn Newly trained 1-class SVM classifier (OCSVM) from data

Dn

5: Test Mn on Dn and compute its expected error 6: A best K classifiers from Mn ∪ A based on expected error

Page 44: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Feature Set extracted

Time, userID, machine IP, command, argument, path, return1 1:29669 6:1 8:1 21:1 32:1 36:0

Page 45: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

PERFORMANCE…..

Page 46: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Updating vs Non-updating stream approach

Performance Contd..

Updating Stream Non-updating Stream

False Positives 13774 24426True Negatives 44362 33710False Negatives 1 1True Positives 9 9Accuracy 0.76 0.58False Positive Rate 0.24 0.42False Negative Rate 0.1 0.1

Page 47: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Supervised Learning

Unsupervised Learning

False Positives 55 95 True Negatives

122 82

False Negatives

0 5

True Positives 12 7 Accuracy 0.71 0.56 False Positive Rate

0.31 0.54

False Negative Rate

0 0.42

Performance Contd..

Supervised (EIT-S) vs. Unsupervised(EIT-U) Learning

Entry Description—Dataset A

User Donaldh

# of records 189

Week 2-7 (Friday only)

Summary of Dataset A

Page 48: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Conclusion & Future WorkConclusion: Evolving Insider threat detection using Stream Mining Unsupervised learning and supervised

learning

Future Work: Misuse detection in mobile device Cloud computing for improving processing time.

Page 49: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

PublicationConference Papers: Pallabi Parveen, Jonathan Evans, Bhavani Thuraisingham, Kevin W. Hamlen, Latifur Khan, “ InsiderThreat Detection Using Stream Mining and Graph Mining,” in Proc. of the Third IEEE InternationalConference on Information Privacy, Security, Risk and Trust (PASSAT 2011), October 2011, MIT, Boston,USA (full paper acceptance rate: 13%).

Pallabi Parveen, Zackary R Weger, Bhavani Thuraisingham, Kevin Hamlen and Latifur KhanSupervised Learning for Insider Threat Detection Using Stream Mining, to appear in 23rd IEEEInternational Conference on Tools with Artificial Intelligence (ICTAI2011), Nov. 7-9, 2011, Boca Raton, Florida, USA (acceptance rate is 30%)

Pallabi Parveen, Bhavani M. Thuraisingham: Face Recognition Using Multiple Classifiers. ICTAI 2006, 179-186Journal:Jeffrey Partyka, Pallabi Parveen, Latifur Khan, Bhavani M. Thuraisingham, Shashi Shekhar: Enhancedgeographically typed semantic schema matching. J. Web Sem. 9(1): 52-70 (2011).Others:Neda Alipanah, Pallabi Parveen, Sheetal Menezes, Latifur Khan, Steven Seida, Bhavani M.Thuraisingham: Ontology-driven query expansion methods to facilitate federated queries. SOCA 2010, 1-8Neda Alipanah, Piyush Srivastava, Pallabi Parveen, Bhavani M. Thuraisingham: Ranking Ontologies UsingVerified Entities to Facilitate Federated Queries. Web Intelligence 2010: 332-337

Page 50: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

References1. W. Eberle and L. Holder, Anomaly detection in Data

Represented as Graphs, Intelligent Data Analysis, Volume 11, Number 6, 2007. http://ailab.wsu.edu/subdue

2. W. Ling Chen, Shan Zhang, Li Tu: An Algorithm for Mining Frequent Items on Data Stream Using Fading Factor. COMPSAC(2) 2009: 172-177

3. S. A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion Detection Using Sequences of System Calls,” Journal of Computer Security, vol. 6, pp. 151-180, 1998.

4. M. Masud, J. Gao, L. Khan, J. Han, B. Thuraisingham, “A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data,” Int.Conf. on Data Mining, Pisa, Italy, December 2010.

Page 51: Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by

Thank You