graph techniques for malware detection
DESCRIPTION
Graph Techniques for Malware Detection. Mark Stamp. Pre-Intro. A lot of malware-related research uses graph techniques Here, we consider 3 research papers All use graphs for malware detection And they are very different approaches There are many good project topics So pay attention…. - PowerPoint PPT PresentationTRANSCRIPT
1
Graph Techniques for Malware Detection
Mark Stamp
Graph Techniques
2
Pre-Intro
A lot of malware-related research uses graph techniques
Here, we consider 3 research paperso All use graphs for malware detection o And they use very different
approaches There are good project topics here
o So pay attention…Graph Techniques
3
Intro
Many graphs defined on softwareo We consider only a few examples
Can use such graphs to compare codeo I.e., scores can be defined for graphs
Graph can serve as a code signatureo Might even identify metamorphic
family A lot of work done in this area
o But still plenty of possible research
Graph Techniques
4
Code Graphs
We consider the following types of software-based graphso Control flow graphso Function call graphso Opcode graphs
But first, discuss graphs in generalo Then graph techniques for malwareo Then consider 3 papers in detail
Graph Techniques
5
Graphs
Graph consists of set of vertices (or nodes) and a set of edges o An edge connects a pair of vertices
Edges can be directed or undirectedo Directed edges go in one directiono Undirected edges, both directions
Edges sometimes include weightso Weights are often probabilities
Graph Techniques
6
Graphs
Graph specified as G = (V,E) o Where V is set of verticeso And E is set of edges pairs of
verticeso Undirected graph, E has unordered
pairso Directed graph, E has ordered pairs
Many applications use graph theory Many general results depend on
matrices derived from graphsGraph Techniques
7
Examples of Graphs
Undirected grapho Vertices are circles (may
be labeled or not)o Edges are lines
Weighted directed grapho Edges are arrowso Edge labels are
“weights”o Weights are
probabilities in this example
Graph Techniques
0.2
0.30.5 1.0
8
Graphs and Matrices
Adjacency matrix A = {aij} o Where aij = 1 if edge from node i to j
o Otherwise, aij = 0 o For undirected graph, A is symmetric
Incidence matrix B = {bij} o Where bij = 1 if vertex i incident edge j
o Otherwise bij = 0
Lots more graphs related to matrices
Graph Techniques
9
Graphs and Matrices
Since we have matrix representations…
We can apply linear algebra to graphs
Example 1o Let A be adjacency matrix of graph G o Note that A is a square matrixo Consider the nth power of A, that is, An
o Element (i,j) of An is number of paths of
length n from vertex i to j in G
Graph Techniques
10
Graphs and Matrices
Example 2o Graph G is connected if any vertex
can be reached from any other vertexo If G is connected, with n vertices, the
rank of its incidence matrix is n – 1 Other interesting results involve
eigenvalues, eigenvectors, etc.
Graph Techniques
11
How to Compare Graphs? Graphs are isomorphic if can re-
label vertices of one to obtain the othero Implies “structure” is the sameo Computational complexity of graph
isomorphism is unknown (but it’s hard)
We’ll need to score graphs G and H o That is, measure similarity of G and H
o Using score that is easy to computeo We’ll see examples later…
Graph Techniques
12
Control Flow Graph Nodes might be “basic blocks”
o Enter at top, only exit is at bottom Control flow useful for optimization
o E.g., remove (obvious) dead code
Graph Techniques
13
Control Flow Graphs Consider dead code
o We can make dead code harder to detect via control flow analysis (How?)
o Why? To obfuscate malware wrt control flow analysis
o Dead code is a useful obfuscationo Result? Control flow analysis can be
defeated by reasonably advanced malware
Control flow graphs are not invincible
Graph Techniques
14
Function Call Graph Higher-level “control flow”
o That is, not focused on basic blockso For example, based on function calls
Previously applied to metamorphic malware detectiono Good results and seems fairly robusto Often, overly complex scoring (IMHO)o Project: Break it and/or improve on ito Discuss this more in upcoming slides
Graph Techniques
15
Opcode Graph Several possible opcode graphs
o We consider case where nodes are opcodes, edges are possible transitions
o And edge-weights are probabilitieso This graph based on digram statistics
How to compare 2 such graphs?o This is the interesting questiono We’ll use a simple, effective method
This problem considered next…Graph Techniques
16
Opcode Graph Similarity and Metamorphic Detection
Neha RunwalRichard M. Low
Mark Stamp
Function Call Graph
17
Intro A previous paper considered
opcode graph analysis for malware detectiono Approach was successfulo But technique seems overly complex
given that graph structure is very simple
o Applied to fairly ordinary (polymorphic) malware, not metamorphic families
We wanted to test simplified score o And consider metamorphic malware
Graph Techniques
18
Software Similarity
Metamorphic detection can be based on measuring software similarity
Lots of related previous researcho HMM, chi-squared, simple substitution
distance, n-gram, etc., etc. We use HMM approach as
benchmarko Recall HMMs and related malware
detection researchGraph Techniques
19
Previous Related Work
Construct a “Markov graph”o Based on digraph frequencies
Then use SVM for classificationo Requires selecting kernel functiono They combined 2 standard kernelso Claim that it compares both local and
global graph structure (not clear) Compare results to n-gram analysis
Graph Techniques
20
Previous Work Collect data dynamically using
“Ether”o I.e., extract opcodes on executed patho Polymorphism (encryption) is no
defense Construct “Markov chain graph”
o An extremely simple graph SVM kernel function
o Combine Gaussian and spectral kernels
o Requires eigenvector computationso Consequently, efficiency is q weakness
Graph Techniques
21
Opcode Graph
Opcode graph is a digrapho In effect, probability opcode A is
followed by opcode B o It’s a simple, easy to construct graph
Question: How to compute scores?o That is, how to measure graph
similarity? First, we consider an example that
illustrates opcode graph constructionGraph Techniques
22
Example Code
Opcode sequence
Graph Techniques
23
Consecutive Pairs
Countso CALL
then ADD occurs twice
o JMP then CALL once
Graph Techniques
24
Edge Weights
Relative frequency
Normalized per rowo Why per
row?
Graph Techniques
25
Example as Opcode Graph
Graph Techniques
26
Previous Work (Again)
Recall that the previous work used SVM to classify opcode graphso And efficiency was not impressive
Here, we want a faster methodo Ignore time required for disassemblyo Previous work used dynamic analysis
We also consider metamorphic code o Previous work used a trojan/backdoor
Graph Techniques
27
Previous Work Results from previous paper…
o Note low FP, high FN for AV productso What’s up with that?
Graph Techniques
28
Comparing Opcode Graphs
We use a much simpler approach Instead of SVM/graph kernel…
o We compare opcode graphs directlyo Also, we consider metamorphic
malware How to directly compare graphs?
o Opcode graph is extremely simple graph
o So, a direct comparison is possibleo SVM is “heavy artillery” for such
graphsGraph Techniques
29
Opcode Graph Score
Let A and B be opcode graphs Map opcodes to 1,2,…,N
o Let A={aij} be edge-weight matrix for A
o Then aij is probability next opcode is j, given that current opcode is i
o Let B={bij} be edge-weight matrix of B
Both A and B are N x N matriceso Corresponding vertices easy to match upo We can simply match up the opcodes
Graph Techniques
30
Opcode Graph Score
Define score(A,B) = (Σ|aij – bij|)2/N2 o Where sum is i=1,2,…,N and j=1,2,
…,N If A = B, then score(A,B) = 0 If aij = 1 and bik = 1 for j ≠ k, then
Σ|aij – bij| = 2 = maximum row sum
Implies score(A,B) ≤ (2N)2/N2 = 4 Hence, 0 ≤ score(A,B) ≤ 4
o The smaller the score, the more similar
Graph Techniques
31
Opcode Graph Score
Other score metrics considered…o Euclidean distance, for exampleo None gave better results
Other graph comparisons considered…o See “note” for this slideo Again, none gave better results
Score we use is easy to computeo But is it effective?
Graph Techniques
32
Data Files
Metamorphic malware: 200 NGVCK Benign: 41 cygwin utility files These files used in other studies
o In particular, HMM analysis, where accuracy was essentially 100%
o Would be nice to have harder test data…
Can compare results to previous worko Metamorphic detection research, that
is
Graph Techniques
33
Results Important cases are “Metamorphic
vs Metamorphic” and “Normal vs Normal”
Graph Techniques
34
Discussion
Might argue that uncommon opcodes are “weighted” too heavilyo Since all nodes count the sameo E.g., MOV and FLDCW treated the
sameo So a few uncommon opcodes in
malware might make it stand out from benign
Tested removing uncommon opcodeso Next slide
Graph Techniques
35
Remove Uncommon Opcodes
Metamorphic vs metamorphico Before and after removal
o Scores don’t change much
Graph Techniques
36
Remove Uncommon Opcodes
After uncommon opcode removalo Metamorphic vs metamorphic…o And normal vs metamorphic
o Still obtain good separation
Graph Techniques
37
Increased Morphing Dead code inserted from benign
files o “Block morphing” used
Graph Techniques
38
Increased Morphing Rate At 30% block morphing
o Misclassifications occur
Graph Techniques
39
Comparison to HMM Using same block morphing, scored
files using HMM detectoro At 30% block morphing, results
comparable to previous slide We conclude that our opcode graph
score is comparable to HMM score Analysis not detailed enough to say
which is actually bettero But we can say any difference is slight
Graph Techniques
40
Random Morphing Benign vs morphed malware
Scores worse at higher morphing???
Graph Techniques
41
Conclusion
Simple opcode graph score testedo Good results, comparable to HMM
We showed how to defeat the score How do results compare to complex
opcode graph/SVM score?o Unfortunately, no direct comparison…
Opcode graphs based on opcode pairso In that sense, similar to HMM…
Graph Techniques
42
Metamorphic Detection Using Function Call Graph
AnalysisPrasad Deshpande
Mark Stamp
Graph Techniques
43
Intro
Function call graphs previously studied (a lot) for malware detection
Here, applied to metamorphic malware
Scoring technique used here follows previous work closely
First, brief background materialo Then explain graph/scoring in detailo Finally, we give results
Graph Techniques
44
Background
Metamorphic techniqueso Register swapo Transpositiono Dead code insertiono Instruction substitutiono Formal grammar mutationo Host code mutationo Code integration
Graph Techniques
45
Background
HMM-based detectiono Again, HMM detection serves as a
benchmark against which we compare We’ve already seen details of
HMMso So, we’ll assume that’s known
Graph Techniques
46
Function Call Graph
Disassemble the programo Local functions look like sub_xxxxxx o External functions too, e.g., GetVersion
Each function is a node in the graph Directed edge from caller to callee
o Edges point to functions that are called
o Edges found using breadth-first searchGraph Techniques
47
Function Call Graph
Example of part of function call graph
Graph Techniques
48
Call Graph Similarity
How to compare function call graphs?
Local functions o Names will not matcho Graph structure can still be compared
External functionso Names should matcho But not much graph structure available
How to combine local/external results?
Graph Techniques
49
External Functions
Given function call graphs G1 and G2
Extract external functions from each
Compare 2 sets of function nameso All matching names are saved for
scoringo Matched names become vertices in
graph We use resulting vertices for
scoringo Scoring details later…
Graph Techniques
50
Local Functions
Methods to compare local functions1. Based on external functions called2. Based on opcode sequence similarity3. Based on “matched neighbors”
This approach follows previous work
Each of the 3 measures is reasonableo But overall, it seems very ad hoco No reason to believe this is optimalGraph Techniques
51
Local Function Match (1)
Local functions “match” if 2 or more external functions in common
Not a very precise criteriao Does not depend on function size, or
number of functions called, or … If this external function criteria
met, local functions considered a match
Graph Techniques
52
Local Function Match (2)
Opcode sequence similarityo Applied to any local functions that
don’t match based on external function calls
o Opcodes grouped into 15 categorieso A 15-bit binary “color” variable
assignedo Bit is “1” if any opcode of type
appearso Also, save counts of each typeo If “colors” match exactly, then
compute cosine similarity of count vectors
Graph Techniques
53
Opcode Sequence Similarity
Categories Why use
categories?o Why not
consider all opcodes?
Graph Techniques
54
Opcode Similarity Example “Colors”
Graph Techniques
55
Opcode Similarity Example
“Colors” and counts
Graph Techniques
56
Opcode Similarity Score
If color vectors of local functions X and Y are sameo And X=(x1,x2,…,x15) count vector of X
o And Y =(y1,y2,…,y15) count vector of Y
Then opcode similarity score is
This is known as cosine similarityo Set threshold to decide
match/nomatch Graph Techniques
57
Local Function Match (3) If 2 functions match, more likely
that neighboring functions matcho Neighbors wrt corresponding graphs
If vertex A matches vertex B (by any of comparisons considered), then…o …compare neighbors of A with those
of B o Use relaxed version of opcode
similarityo That is, colors need not match exactly,
and match threshold is also relaxed
Graph Techniques
58
Similarity Score
Find all common vertices of G1 and G2 o Using all 3 methods just discussed
Suppose nodes A,B in graph G1 match C,D in G2, respectively o If there is an edge from A to B in G1 and
an edge from C to D in G2 … o …then we have found a common edge
Compute total number of common edges
Graph Techniques
59
Similarity Score
Similarity of G1 and G2 given by
o Where E(G) is the edge set of Go And |X| is number of elements in set X
Score seems pretty complexo But, actually not too bad to compute
Graph Techniques
60
MWOR Scores
Graph Techniques
Padding ratioo Higher the ratio,
more dead codeo E.g., 2.0 has
twice the dead code as actual virus code
61
MWOR ROC Curves Corresponding
ROC curves Results look
goodo How good?o AUC compared
to other techniques?
Graph Techniques
62
AUC Comparison Compare to HMM & simple
substitutiono HMM is our usual benchmarko Call graph offers some improvement
Graph Techniques
63
How to Break This Score?
Since score is based on functions… Obfuscate function structure?
o Inlining, outlining, insert dead/do-nothing functions, other?
We made limited attempts to break ito But not much successo Maybe it’s robust/hard to break, or
maybe we did not try hard enoughGraph Techniques
64
Conclusion
Call graph analysis for malware detection has been widely studied
We obtained good results on challenging metamorphic malware
But strengths/weaknesses not clearo Suggests analysis is not yet sufficiento Robustness of score is not clear either
Graph Techniques
65
Future Work
Break it!o This would be a good projecto Want to know how robust the score iso Need to find weak points before we
can think about improvements Also, score seems very ad hoc
o No evidence that it is (near) optimalo Should try to modify/optimize score
Graph Techniques
66
Common Malware Behavior Through Graph Clustering
Younghee ParkDouglas S. Reeves
Mark Stamp
Graph Techniques
67
Intro Goal is to construct “common
behavioral graph” and use for malware detectiono A graph that represents behavior
common to all elements in training seto Constructed via “graph clustering”o Sounds plausible for metamorphic code
Graph is based on “kernel objects”o Where kernel means OS kernelo “Kernel object” based on system callsGraph Techniques
68
Overview
Training sample executed in sandbox
Observed system call info collectedo So, only execution path is analyzedo Some important code might be
missed… Kernel Object Behavioral Graph
(KOBG) constructedo Relationship between “kernel objects”o Kernel objects derived from system
calls
Graph Techniques
69
Overview
Given KOBGs from training files Construct Weighted Common
Behavioral Graph (WCBG)o Supergraph of KOBGs
Extract “HotPath” subgraph of WCBGo Where HotPath common to all KOBGs
To score, extract a KOBG for programo Score based on WCBG and HotPath
Graph Techniques
70
KOBG Kernel object is memory block
o Windows kernel objects: processes, threads, files, events, sockets, etc.
o Only accessible by kernelo Application cannot directly modify
In KOBG, vertices are kernel objectso Edges labeled with handle
types/valueso “Handle” relates to system resourceo Edge weights discussed later…
Graph Techniques
71
Kernel Objects
For example, system call NtCreateFile creates new file
Kernel object created by system callo Handle returned and can be used as
argument in other system calls Thus we can determine
dependencieso And these are the edges in KOBG
In KOBG, all edge-weights set to 1Graph Techniques
72
KOBG Example Here, digits are handle values
o And letters are kernel object names
Graph Techniques
73
WCBG The WCBG based on graph
“clustering” Let G = {G1,G2,…,Gn} be weighted
directed graphs Then H = WMinCS(G) is Weighted
Minimum Common Supergraph of G o Supergraph implies an isomorphism
exists from each Gi into H o Minimum, so H has nothing “extra” in ito What about edge weights? Next slide…Graph Techniques
74
WCBG
Recall H = WMinCS(G) o Where G = {G1,G2,…,Gn}
Weight of edge e in H is k/n o Where k is number of Gi whose
isomorphism maps edge of Gi to e in H
o And n is number of graphs Gi in the set G
o I.e., the fraction of Gi that have attribute
Not really as fancy as it sounds…Graph Techniques
75
WCBG
Let H = WMinCS(G), where set G consists of KOBGs
Let WCBGΘ(G) be subgraph of H with all edges of weights greater than Θ o Including all vertices those edges
connect Why threshold based on Θ ?
o Goal is to make score more resilient against attacks that insert system calls
Graph Techniques
76
Practicalities
Computing WMinCS(G) is NP-complete
Efficient approx. algorithms existo This research uses McGregor
algorithmo Simple idea construct WMinCS for 1st
pair of graphs, add 3rd then 4th and so on
o Fast, but may not be optimal (Why?) To get WCBGΘ(G) from WMinCS,
prune edges with weights below threshold Θ
Graph Techniques
77
HotPath
Once WCBGΘ(G) is known Define HotPath as maximal path
where every edge has weight 1o Implies that HotPath occurs in all
instances in training seto It could happen that HotPath is empty
(not considered in paper…)
Graph Techniques
78
Scoring
Given WCBG = WCBGΘ(G) where G is set of KOBGs
And H = KOBG for file to scorescore(H) = Σ W(Ei,ej) / min(|WCBG|,|H|)
o Where W(Ei,ej) is the weight of edge Ei in WCBG and matching edge ej in H
o And |X| is number of edges in graph X
Graph Techniques
79
Classification
Given file to score, let H be its KOBG
Compute score(H) as on previous slide
Classify H as malware wrt WCBG if …o We have score(H) exceeding the
threshold and H contains HotPath Very heavy dependence on HotPath
This is a possible weak point
o Can attacker take advantage?
Graph Techniques
80
Data Sets
Following data sets considered
Allaple polymorphic. What are others?
Graph Techniques
81
Detection Rates and False Positive Rates
Results here are for Θ = 0.5
Detection rates for training set?o What’s up with that???
Graph Techniques
82
HotPath Analysis Number of edges grouped by
weighto Note that Θ = 0.5 in WCBGs
Graph Techniques
83
Robustness Experiments
Effect of system call injection
Graph Techniques
84
Questions
What is contribution of HotPath and score in overall results?o Could test each separatelyo Then compare results
Experiments involving parameter Θ ?o Why Θ=0.5? Any evidence this is
best?o Ideally, plot AUC (from ROC) vs Θ
Scoring the training sets? Very oddGraph Techniques
85
Conclusions
Interesting, graph-based technique Results good, but not spectacular
o Statistical analysis not entirely clearo Maybe results are better than they
seem… The HotPath is somewhat
confoundingo Essentially, HotPath is a signatureo Would be interesting to see results just
using HotPath (i.e., no score/threshold)Graph Techniques
86
Graph Techniques Graphs are a very general tool
o Lots of general/theoretical results that can be applied to malware problems
o Lots of graphs related to software Lots of graph techniques have been
applied to malware detection problemo We considered 3 specific exampleso We barely scratched the surface here
Many good research problems remain
Graph Techniques
87
References: Graph Theory
R.J. Wilson, Introduction to Graph Theory, Pearson
Graph Techniques
88
References: Opcode Graphs
N. Runwal, R.M. Low, and M. Stamp, Opcode graph similarity and metamorphic detection, Journal in Computer Virology, 8(1-2):37-52, 2012
B. Anderson, et al, Graph-based malware detection using dynamic analysis, Journal in Computer Virology, 7(4):247-258, 2011
Graph Techniques
89
References: Call Graphs Metamorphic detection using
function call graph analysis, P. Deshpande and M. Stamp, submitted
D. Bilar, On callgraphs and generative mechanisms, Journal in Computer Virology, 3(4):285-297, 2007
Graph Techniques
90
References: Call Graphs X. Ming, et al.,
A similarity metric method of obfuscated malware using function-call graph, Journal of Computer Virology and Hacking Techniques, 9(1):35-47, 2013
S. Shang, et al., Detecting malware variants via function-call graph similarity, 5th International Conference on Malicious and Unwanted Software (MALWARE), pp. 113-120, 2010Graph Techniques
91
References: Graph Clustering
Y. Park, D.S. Reeves, and M. Stamp, Deriving common malware behavior through graph clustering, Computers & Security, 39(B):419-430, 2013
Graph Techniques