hunting for metamorphic engines
Post on 03-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Hunting for Metamorphic Engines
Wing WongMark Stamp
Hunting for Metamorphic Engines
2
In This Paper, We…
Analyze metamorphic malwareo Hacker-produced metamorphic code
Measure similarity of softwareo Based on n-gram analysis
Compute scoreso Based on n-grams ando Based on HMMs
This paper is baseline for future work
Hunting for Metamorphic Engines
3
Motivation
Many virus construction kits availableo Many can produce metamorphic code
So anybody can create “new” version of existing malwareo Virtually no technical expertise
required How “effective” is the resulting
metamorphic code? Can we detect metamorphic
malware?
Hunting for Metamorphic Engines
4
Background Encrypted, polymorphic,
metamorphico Metamorphic == body polymorphic
Metamorphic vs cloned softwareo Clone is the norm, but metamorphic
could offer advantages to the good guy too…
From the theory, we know malware detection is NP-completeo And metamorphic is at least as hardo But what about practical situation?
Hunting for Metamorphic Engines
5
Metamorphism
Metamorphic code changes it “shape”
Well-known exampleso W95/Regswapo W32/Ghosto W95/Zpermo MetaPHOR
Hunting for Metamorphic Engines
6
Metamorphism
General techniques availableo Insertiono Substitutiono Transpositiono Deletion
Some easier to implement than others
Some more effective against certain detection strategies
Hunting for Metamorphic Engines
7
Virus Construction Kits
In this paper, we considero PS-MPC (Phalcon/Skism Mass
Produced Code generator)o G2 (Second Generation virus
generator)o MPCGEN (Mass Produced Code
GENerator)o NGVCK (Next Generation Virus
Construction Kit)o VCL32 (Virus Creation Lab for Win32)
Hunting for Metamorphic Engines
8
Virus Construction Kits
Did not consider MetaPHORo Difficult to work with, finicky
All of these claim to be metamorphic
Are they really?o How can we measure
“metamorphism”? If they are highly metamorphic, can
we still detect them?Hunting for Metamorphic Engines
9
Brief Review of Malware Detection
First generationo Signature scanning, wildcards OK
Second generationo Approximate signature scanning; e.g.,
ignore NOP instructions Code emulation Heuristic analysis
o Static or dynamic, false positives…
Hunting for Metamorphic Engines
10
Machine Learning
Consider the followingo Data Mining, Neural Networks, HMMs
Data Miningo Malware-related previous worko Generic approach
Neural Networkso Previous work based on byte trigramso Developed and used at IBM
Hunting for Metamorphic Engines
11
Hidden Markov Models
Train HMM on metamorphic family Then we can score any file to see
how “close” it is to the family What to use to train such an HMM?
o Raw bytes in exe?o Disassembled code?o Opcode sequence?
Hunting for Metamorphic Engines
12
Software Similarity
How to quantify metamorphism? In general, how to measure
similarity of software? Given program 1 and program 2.. We develop a score
o Score of 0 means “no similarity”o Score of 1 means “virtually identical”
Hunting for Metamorphic Engines
13
N-gram Similarity
Given executable files X and Y Extract opcode sequences from
eacho Suppose X has n opcodeso Suppose Y has m opcodes
How to compare the sequences? Many possible ways --- here we use
n-gram analysiso That is, we compare subsequences
Hunting for Metamorphic Engines
14
N-gram Similarity
Extracted opcode sequenceso X=(x0,x1,…,xn-1) and Y=(y0,y1,…,ym-1)
Compare subsequences of length ko Then xi,xi+1,…,xi+k-1 matches yj,yj+1,…,yj+k-1 if
they are the same in any ordero For each such match, plot the point (i,j)o Remove any segments less than p points
Then score = (x axis covered + y axis covered)
/ 2
Hunting for Metamorphic Engines
15
N-gram Similarity Example
Hunting for Metamorphic Engines
16
N-gram Similarity
Score is between 0 and 1 If program X identical to program Y
o Main diagonal is a solid lineo And score = 1
Minimum score is 0 The smaller the score, the less
similar are the programs
Hunting for Metamorphic Engines
17
Typical N-gram Similarity
Hunting for Metamorphic Engines
Normal (cygwin utility) files
18
Typical N-gram Similarity
Hunting for Metamorphic Engines
NGVCK
19
Typical N-gram Similarity
Hunting for Metamorphic Engines
G2
20
N-gram Similarity
Compare members of a “family” with each other
Hunting for Metamorphic Engines
21
N-gram Similarity
In graphical form…
Hunting for Metamorphic Engines
22
N-gram Similarity Conclusion?
G2 more similar to each other than expectedo So, they are not very metamorphico Ditto for most of the other generators
But, NGVCK viruses more different from each other than expectedo So, they are highly metamorphic
Implication wrt signature detection?
Hunting for Metamorphic Engines
23
NGVCK Similarity
Compare NGVCK to other families…
Hunting for Metamorphic Engines
24
NGVCK Similarity Conclusion?
NGVCK viruses very different from each othero Implies highly metamorphic…o …so, signature detection will fail
But NGVCK viruses are even more different from normal fileso Then what about detection?
Hunting for Metamorphic Engines
25
Aside: Similar Similarity Measures to Consider?
Given opcode sequenceso Edit distanceo Other sequence comparison
techniqueso Statistical measures
Considering raw byteso Statistical measureso Entropy and other “structural”
measuresHunting for Metamorphic Engines
26
Hidden Markov Models
Generic view of HMM
Hunting for Metamorphic Engines
27
HMM Notation
Hunting for Metamorphic Engines
28
HMM for Metamorphic Detection
Train HMMo Extract opcodes from family
executableso Append opcode sequenceso Train a model, i.e., determine matrices
Use trained HMM to score fileso Given an file, extract opcode
sequenceo Score sequence against the modelo Compare to predetermined thresholdHunting for Metamorphic Engines
29
HMM Scoring: Fine Points
Score computed as log likelihood of the scored sequenceo Normalize as “log likelihood per
opcode”o Why LLPO?
How to quantify effectiveness?o ROC curves are very usefulo Specifically, area under ROC curve
(AUC)Hunting for Metamorphic Engines
30
Results
HMM scoring for NGVCK family
Hunting for Metamorphic Engines
31
HMM Scoring: Bottom Line
Signature detection for metamorphic families, except NGVCK
For NGVCK, we can use HMMo Classification is 100% when compared
to normal (benign) fileso Some misclassifications of other
malware (is that good or bad?) Should include ROC curves, AUC, …Hunting for Metamorphic Engines
32
HMM States: 3 State Model
Hunting for Metamorphic Engines
33
N-gram Score
Can also score files using N-grams Randomly select NGVCK file
o Extract its opcode sequence Given a file we want to score
o Extract its opcode sequenceo N-gram similarity to NGVCK sequenceo Higher similarity, classify as NGVCKo Lower similarity, classify as “not
NGVCK”Hunting for Metamorphic Engines
34
N-gram Score Results?
For NGVCK, obtain ideal separationo There exists a threshold for which…o …we can separate NGVCK from
normal Surprisingly strong results
o For such a simple similarity score Why does this work?
o We come back to this at the end…
Hunting for Metamorphic Engines
35
Compare to Commercial AV
Tested following on our virus setso eTrust, avast!, AVG
These scanners detected most of the viruses from weak familieso That is, G2, VCL32, etc.
But none of the NGVCK viruses detected by any of the 3 scanners
Hunting for Metamorphic Engines
36
Conclusion
HMM effective at detecting a highly metamorphic NGVCK malware family
N-gram similarity also effective NGVCK not detected by commercial
AV So, this detection improves the state
of the art Practical considerations?Hunting for Metamorphic Engines
37
Lessons Learned?
Why can we detect NGVCK family? In spite of high metamorphism,
code is statistically different from normal
“Improved” metamorphic malware? Metamorphism must be sufficient
to evade signature detection But, metamorphic family must be
statistically similar to normalHunting for Metamorphic Engines
38
Future Work
Build a better metamorphic generatoro Some progress here, but still
detectable using other detection methods
o Still need better generators… Develop and test other detection
strategieso Lots of work done here tooo But lots more to doHunting for Metamorphic Engines
39
References
W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3):211-229, 2006
M. Stamp, A revealing introduction to hidden Markov models
Hunting for Metamorphic Engines
top related