a transductive scheme based inference techniques for network forensic analysis
Post on 18-Feb-2017
86 Views
Preview:
TRANSCRIPT
A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUESFOR NETWORK FORENSIC
ANALYSIS
BY: AKSHAYA ARUNAN
M1 NE [IT]
GECBH
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 1
OUTLINE
Objective
Introduction
Literature Survey
Proposed System
Conclusion
Reference
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 2
OBJECTIVE
To develop a Network Intrusion Forensics System based on “transductive
scheme” that can
detect and analyze efficiently computer crime
extract digital evidence
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 3
INTRODUCTION
Rapid development of network connectivity
Complexity and growth
Increase in the number of crimes
System connected are potential candidates for the malicious attack
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 4
These attacks can affect:
physical or digital assets
funds
consumer confidence
national security
loss of life
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 5
Network Forensics
Goal: To discover the source of security breaches or other information assurance
problems [1].
Evidence is captured from networks
Interpretation is substantially based on knowledge of network attacks
Allows us to make forensic determinations based on the observed traffic [2]
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 6
LITERATURE SURVEY
Tcpdump [4],[5]
Wireshark[5]
Artificial Neural Network[1]
Support Vector Machine[5],[6]
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 7
tcpdump
A free source common packet analyzer that runs under the command line.
Few functions:
Prints the contents of network packets
Display TCP/IP and other packets being transmitted or received
Can read packets from a network interface card
Can write packets to standard output or a file
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 8
Wireshark
Wireshark is a free and open source packet analyzer.
Wireshark is similar to TCP Dump, but has a graphical front-end, plus some
integrated sorting and filtering options.
It is used for
network troubleshooting
analysis
software and communications protocol development
educational purpose
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 9
Artificial Neural Network [1]
An ANN is an interconnected group of nodes, akin to the vast network of
neurons in a brain.
They can be used to infer a function from:
observations
data processing
Example: Robotics etc.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 10
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 11
INPUT HIDDEN OUTPUT
In the figure, each node represents an artificial neuron and an arrow represents a
connection from the output of one neuron to the input of another.
Support Vector Machine [5], [6]
Constructs a hyperplane or a set of hyperplanes in a high or infinite dimensional
space, which can be used for classification, regression, or other tasks.
Supervised learning models
Analyze data and recognize patterns
Hyperplane: It is a subspace of one dimension less than its ambient space
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 12
Disadvantages
ANN and SVM:
They were designed to find features for network forensics
These methods are effective in reducing the processing-time
But are insufficient in forensic analysis
tcpdump and Wireshark
These tools are designed to help debug network problems, but not special for forensic analysis
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 13
PROPOSED SYSTEM
First, we propose an efficient TCM-KNN[3] based inference technology
It is much more effective than single, multiple traffic threshold
Second, to boost the real-time network forensic performance of TCM-KNN
simulated annealing (SA) algorithm[10]
Reduce the computational cost
More suitable in real network environment
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 14
Transductive Confidence Machines for K-Nearest Neighbors
Commonly used machine learning and data mining method
Effective in fraud detection, pattern recognition and outlier detection
The confidence measure used in TCM is based upon universal tests for
randomness or their approximation
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 15
Transductive scheme based network forensic
We develop a network intrusion forensics system based on transductive scheme
(NIFSTC) that can detect and analyze efficiently
network crime, and
digital evidence
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 16
NIFSTC consists of the following components:
Network Traffic Capturer
Instance Selection and Feature Extractor
TCMKNN Based Network Forensic Analyzer
Evidence Analyzer
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 17
NIFSTC system architecture
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 18
Traffic capturer
The first step of NIFSTC system
Network traffic capture
Preparation for traffic analysis
Provides the base information for other components of the forensics system
The traditional packet capture library, Libpcap[4]
provides implementation independent access to the underlying packet capture facility
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 19
Problems while using Libcap:
While heavy traffic network - captured data is transferred by the kernel to the user
processes with system call and memory copy.
In a high throughput network - the total amount of valuable CPU cycles is non-
ignorable.
The system overhead- too many operations of memory copy will consume a large
amount of CPU and memory resources.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 20
In order to improve the packet capture performance of the NIFSTC, it is
necessary
to reduce the intermediate steps during packet transmission,
bypass the OS kernel and
eliminate kernel’s memory copy.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 21
An efficient user-level packet capture mechanism based on semi-polling driven
technique [7,8].
Semi polling - With the semi-polling driven mechanism,
1) interrupts frequency is lowered
2) processing performance for short message is significantly ameliorated
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 22
TCM-KNN based network forensic analyzer
TCM-KNN is an algorithm combining TCM [9] and KNN algorithm effectively
In the KNN algorithm, we denote the sorted sequence (in ascending order) of
the distances of point “i”, from the other points, with the same classification “y”
as
In this paper, we use Euclidean distance to calculate the distances between
points
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 23
𝐷𝑖𝑦
We assign to every point a measure called the individual strangeness measure
This measure defines the strangeness of the point in relation to the rest of the
points
In our case the strangeness measure for a point I belonging to a normal class is
defined as:
= Ʃ D (1)
computed for an anomaly
D will stand for the jth shortest distance in this sequence
k is the number of neighbors used
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 24
ikJ=1 ij
Equation (1) to compute the p-value as follows:
p( ) = #{i: ≥ }
(n+1)
# denotes the cardinality of the set
is the strangeness value for the test point
is among the j largest occurs with probability of at most j/n+1.
p value – non universal tests (Proedru et al) - a measure of how well the data
supports or not a null hypothesis – should be smaller to get greater evidence
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 25
new inew (2)
new
Feature extractorExtracting features on the “network traffic” captured by Traffic Capturer
component.
A group of features is a kind of data structure characterizing network traffic.
The data structure for network event analysis is the connection log.
Some of the secondary attributes are
1) TCP flags
2) connection duration
3) volume of data passed in each direction
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 26
Simulated annealing basedinstance selection
A local search technique simulating the physical process of “annealing”[10].
Deals with highly non–linear problems.
Begins a random solution, and in the next neighborhood search for each step of
the process.
Moves are controlled by some probability function.
The acceptance of a downhill depends on reduction in the value of the objective function
size of the search time
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 27
Selects the most contributing examples and omits useless fitness function.
To apply SA, two important problems should be addressed:
Specification of the representation of the solutions
Definition of the fitness function
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 28
1) Representation:
Training dataset - TR with instances.
Search space associated with the instance selection of TR is constituted by –Subsets of TR
Eg: chromosomes - subsets of TR - Uses a binary representation
A chromosome consists of genes with two possible states: 0 and 1
If 1, then its associated instance is included in the subset of TR represented by the chromosome.
If 0, then this does not occur.
Result: Selected chromosomes would be the reduced training dataset for TCM-KNN.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 29
2) Fitness function:
Let F(X) be a subset of instances of TR to evaluate and be coded by a chromosome.
Three measures to be seriously considered:
TP
FP
Percentage of training dataset reduction
Thus, Fitness function combines three values:
the detect_rate associated with fal_rate
reduce_rate of instances of with regards to TR
F(x)=C * (detect_rate - fal_rate) +(1-C) * reduce_rate (3)
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 30
reduce rate =|TR|-|S | * 100 (4)
|TR|
|TR| - the number of the original training dataset and
|S| - the reduced training dataset using SA
C - an adjustment constant set by experiences
The objective of the SA is to maximize the fitness function defined
maximize detection rate
minimize the number of instances obtained as well as FP rate
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 31
Evidence analyzer
Can connect distant, and incomplete abnormal events
A set of evidence analyzing utilities can examine different aspects of correlated
events in an efficient way
Then utilities are formed into NIFSTC system
Evidence analyzer uses two work modes:
1) count mode or
2) weighted analysis mode
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 32
Evidence analyzer results in undirected evidence graph
Value of the attribute - nodes in graph
Node size - different weight
Edges - a relationship between two attribute values.
An evidence graph is shown in figure.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 33
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 34
Evidence Graph
CONCLUSION
TCM- KNN is the most modern and precise algorithm to detect the network
crimes and analyze the forensic data.
Evidence analyzer gives the package of number of evidences and corresponding
weighted values.
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 35
REFERENCES1) S Mukkamala, A.Sung, - ‘’Identifying significant features for network forensic analysis using
artificial intelligent techniques’’ - Int’l Journal of Digital Evidence[2003]
2) M.I. Cohen. PyFlag‚ - “An advanced network forensic framework” - Digital Investigation
(Elsevier Journal) [2008]
3) Y. Li, L. Guo, - “An active learning based TCM-KNN algorithm for supervised network
intrusion detection” – Computers Security (Elsevier Journal) [2007]
4) Libpcap – http://www.tcpdump.org/release/libcap-0.7.2.tar.gz, [2002]
5) Wikipedia – www.wikipedia.com
6) E. Eskin, A. Arnold, M, Prerau, L. Portnoy, S. Stolfo. – “A geometric framework for
unsupervised anomaly detection: detecting intrusions in unlabeled data” - D. Barbara and S.
Jajodia (editors), Applications of Data Mining in Computer Security, Kluwer, [2002]
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 36
7) ZH Tian, BX Fang, XC Yun, - “User-Level message passing mechanism based on semi-
polling driven in RTLinux” - Journal of Software [2004]
8) ZH Tian, MZ Hu, B Li., - “Semi-Polling Based Interrupt Mitigation for High Performance
Packet Processing” - High Technology Letters [2005]
9) A. Gammerman, V. Vovk, - “Prediction algorithms and confidence measure based on
algorithmic randomness theory”, - Theoretical Computer Science[2002]
10) Aarts, E. and van Laarhoven, - “ Simulated anealing: A pedestrian review of the theory and
some applications”, in J. Kittler and P.A. Devijver (Eds.) - Pattern Recognition and
Applications, Springer-Verlag, Berlin[1987]
22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 37
top related