simple substitution distance and metamorphic detection simple substitution distance 1 gayathri...

43
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

Upload: justina-dawson

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

 We treat each metamorphic copy as if it is an “encrypted” version of “base” virus o Where the “cipher” is a simple substitution  Why simple substitution? o Easy to work with, fast algorithm to solve  Why might this work? o Simple substitution “cryptanalysis” tends to yield results that match family statistics o Accounts for modifications to files similar to some common metamorphic techniques 3 Simple Substitution Distance

TRANSCRIPT

Page 1: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

1

Simple Substitution Distance and Metamorphic

Detection

Simple Substitution Distance

Gayathri ShanmugamRichard M. Low

Mark Stamp

Page 2: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

2

The Idea Metamorphic malware “mutates”

with each infection Measuring software similarity is a

possible means of detection But, how to measure similarity?

o Much relevant previous work Here, a novel distance measure is

consideredSimple Substitution Distance

Page 3: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

3

Simple Substitution Distance

We treat each metamorphic copy as if it is an “encrypted” version of “base” viruso Where the “cipher” is a simple substitution

Why simple substitution?o Easy to work with, fast algorithm to solve

Why might this work?o Simple substitution “cryptanalysis” tends to

yield results that match family statisticso Accounts for modifications to files similar to

some common metamorphic techniques

Simple Substitution Distance

Page 4: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

4

Motivation Given a simple substitution ciphertext where

plaintext is English…o If we cryptanalyze using English language statistics,

we expect a good scoreo If we cryptanalyze using, say, French language

statistics, we expect a not-so-good score We can obtain opcode statistics for a

metamorphic familyo Using simple substitution cryptanalysis, a virus of

same family should score well… o …but, a benign exe should not score as wello Assuming statistics of these families differ

Simple Substitution Distance

Page 5: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

5

Metamorphic Techniques Many possible morphing strategies Here, briefly consider

o Register swappingo Garbage code insertiono Equivalent substitutiono Transpositiono Formal grammar mutation

At a high level --- substitution, transposition, insertion, and deletion

Simple Substitution Distance

Page 6: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

6

Register Swap Register swapping

o E.g., replace EBX register with EAX, provided EAX not in use

Very simple and used in some of first metamorphic malware

Not very effectiveo Why not?

Simple Substitution Distance

Page 7: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

7

Garbage Insertion Garbage code insertion Two cases:

o Dead code --- inserted, but not executed We can simply JMP over dead code

o Do-nothing instructions --- executed, but has no effect on program Like NOP or ADD EAX,0

Relatively easy to implement Effective at breaking signature detection

Simple Substitution Distance

Page 8: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

8

Code Substitution Equivalent instruction substitution

o For example, can replace SUB EAX,EAX with XOR EAX,EAX

Does not need to be 1 for 1 substitutiono That is, can include insertion/deletion

Unlimited number of substitutions Very effective Somewhat difficult to implement

Simple Substitution Distance

Page 9: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

9

Transposition Transposition

o Reorder instructions that have no dependency

For example,MOV R1,R2 ADD R3,R4ADD R3,R4 MOV R1,R2

Can be highly effective But, can be difficult to implement

o Sometimes applied only to subroutines

Simple Substitution Distance

Page 10: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

10

Formal Grammar Mutation Formal grammar mutation View morphing engine as non-

deterministic automatao Allow transitions between any symbolso Apply formal grammar rules

Obtain many variants, high variation Really just a formalization of others

approaches, not a separate technique

Simple Substitution Distance

Page 11: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

11

Previous Work Easy to prove that “good” metamorphic

code is immune to signature detectiono Why?

But, many successes detecting hacker-produced metamorphic malware…o HMM/PHMM/machine learningo Graph-based techniqueso Statistics (chi-squared, naïve Bayes)o Structural entropyo Linear algebraic techniques

Simple Substitution Distance

Page 12: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

12

This Research Measure similarity using “simple

substitution distance” We “decrypt” suspect file using

statistics from a metamorphic familyo If decryption is good, we classify it as

a member of the same metamorphic family

o If decryption is poor, we classify it as NOT a member of the given metamorphic family

Simple Substitution Distance

Page 13: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

13

Simple Substitution Cipher Simple substitution is one of the

oldest and simplest means of encryption

A fixed key used to substitute letterso For example, Caesar’s cipher, substitute

letter 3 positions ahead in alphabeto In general, any permutation can be key

Simple substitution cryptanalysis?o Statistical analysis of ciphertext

Simple Substitution Distance

Page 14: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

14

Simple Substitution Cryptanalysis Suppose you observe the ciphertext

PBFPVYFBQXZTYFPBFEQJHDXXQVAPTPQJKTOYQWIPBVWLXTOXBTFXQWAXBVCXQWAXFQJVWLEQNTOZQGGQLFXQWAKVWLXQWAEBIPBFXFQVXGTVJVWLBTPQWAEBFPBFHCVLXBQUFEVWLXGDPEQVPQGVPPBFTIXPFHXZHVFAGFOTHFEFBQUFTDHZBQPOTHXTYFTODXQHFTDPTOGHFQPBQWAQJJTODXQHFOQPWTBDHHIXQVAPBFZQHCFWPFHPBFIPBQWKFABVYYDZBOTHPBQPQJTQOTOGHFQAPBFEQJHDXXQVAVXEBQPEFZBVFOJIWFFACFCCFHQWAUVWFLQHGFXVAFXQHFUFHILTTAVWAFFAWTEVOITDHFHFQAITIXPFHXAFQHEFZQWGFLVWPTOFFA

Analyze frequency counts…

Likely that ciphertext “F” represents “E”o And so on, at least for common letters

Simple Substitution Distance

Page 15: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

15

Simple Substitution Cryptanalysis Can even automate attack

1. Make initial guess for key using frequency counts2. Compute oldScore3. Modify key by swapping adjacent elements4. Compute newScore5. If newScore > oldScore then oldScore = newScore6. Else unswap elements7. Goto 3

How to compute score?o Number of dictionary words in putative plaintext?o Much better to use English digraph statistics

Simple Substitution Distance

Page 16: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

16

Jackobsen’s Algorithm Method on previous slide can be

slowo Why?

Jackobsen’s algorithm uses similar idea, but fast and efficiento Ciphertext is only decrypted onceo So algorithm is (essentially)

independent of length of messageo Then, only matrix manipulations

requiredSimple Substitution Distance

Page 17: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

17

Jackobsen’s Algorithm: Swapping Assume plaintext is English, 26 letters Let K = k1,k2,k3,…,k26 be putative key

o And let “|” represent “swap” Then we swap elements as follows

Also, we restart this swapping schedule from the beginning whenever score improves

Simple Substitution Distance

Page 18: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

18

Jackobsen’s Algorithm: Swapping Minimum swaps is 26 choose 2, or 325 Maximum is unbounded Each swap requires a score computation Average number of swaps? Experimentally

o Ciphertext of length 500, average 1050 swapso Ciphertext of length 8000, avg just 630 swaps

So, work depends on length of ciphertexto More ciphertext, better scores, fewer swaps

Simple Substitution Distance

Page 19: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

19

Jackobsen’s Algorithm: Scoring

Let D = {dij} be digraph distribution corresponding to putative key K

Let E = {eij} be digraph distribution of English language

These matrices are 26 x 26 Compute score as

Simple Substitution Distance

Page 20: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

20

Jackobsen’s Algorithm So far, nothing fancy here

o Could see all of this in a CS 265 assignment Jackobsen’s trick: Determine new D

matrix from old D without decrypting How to do so?

o It turns out that swapping elements of K swaps corresponding rows and columns of D

See example on next slides…

Simple Substitution Distance

Page 21: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

21

Swapping Example To simplify, suppose 10 letter

alphabetE, T, A, O, I, N, S, R, H, D

Suppose you are given the ciphertextTNDEODRHISOADDRTEDOAHENSINEOARDTTDTINDDRNEDNTTTDDISRETEEEEEAA

Frequency counts given bySimple Substitution Distance

Page 22: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

22

Swapping Example We choose the putative

key K given here The corresponding

putative plaintext isAOETRENDSHRIEENATERIDTOHSOTRINEAAEASOEENOTEOAAAEESHNATTTTTII

Corresponding digraph distribution D is

Simple Substitution Distance

Page 23: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

23

Swapping Example Suppose we

swap first 2 elements of K

Then decrypt using new K

And compute digraph matrix for new K

Previous key K

New key K

Simple Substitution Distance

Page 24: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

24

Swapping Example

Old D matrix vs new D matrix

What do you notice?

So what’s the point here?

This is good!Simple Substitution Distance

Page 25: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

25

Jackobsen’s Algorithm

Simple Substitution Distance

Page 26: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

26

Proposed Similarity Score Extract opcodes sequences from

collection of viruseso All viruses from same metamorphic

family Determine n most common opcodes

o Symbol n+1 used for all “other” opcodes Use resulting digraph statistics form

matrix E = {eij} o Note that matrix is (n+1) x (n+1)

Simple Substitution Distance

Page 27: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

27

Scoring a File Given an executable we want to score Extract it’s opcode sequence Use opcode digraph stats to get D = {dij}

o This matrix also (n+1) x (n+1) Initial “key” K chosen to match monograph stats

of virus familyo Most frequent opcode in exe maps to most frequent

opcode in virus family, etc. Score based on distance between D and E

o “Decrypt” D and score how closely it matches Eo Jackobsen’s algorithm used for “decryption”

Simple Substitution Distance

Page 28: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

28

Example Suppose only 5 common opcodes in family

viruses (in descending frequency)

Extract following sequence from an exe

Initial “key” is

And “decrypt is

Simple Substitution Distance

Page 29: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

29

Example Given “decrypt”

Form D matrix

After swap…o And so on…

Simple Substitution Distance

Page 30: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

30

Scoring Algorithm

Simple Substitution Distance

Page 31: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

31

Quantifying Success Consider these 2 scatterplots of scores

Which is better (and why)?

Simple Substitution Distance

Page 32: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

32

ROC Curves Plot true-positive vs

false positiveo As “threshold” varies

Curve nearer 45-degree line is bad

Curve nearer upper-left is good

Simple Substitution Distance

Page 33: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

33

ROC Curves Use ROC curves to quantify

success Area under the ROC curve (AUC)

o Probability that randomly chosen positive instance scores higher than a randomly chosen negative instance

AUC of 1.0 implies ideal detection AUC of 0.5 means classification is

no better than flipping a coinSimple Substitution Distance

Page 34: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

34

Parameter Selection Tested the following parameters

o Opcode matrix sizeo Scoring functiono Normalizationo Swapping strategy

None significant, except matrix sizeo So we only give results for matrix size

hereSimple Substitution Distance

Page 35: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

35

Opcode Matrix Size Obtained following results

So, ironically, we use 26 x 26 matrix

Simple Substitution Distance

Page 36: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

36

Test Data Tested the following metamorphic families

o G2 --- known to be weako NGVCK --- highly metamorphico MWOR --- highly metamorphic and stealthy

MWOR “padding ratios” of 0.5 to 4.0 For G2 and NGVCK

o 50 files tested, cygwin utilities for benign files For each MWOR padding ratio

o 100 files tested, Linux utilities for benign files 5-fold cross validation in each experiment

Simple Substitution Distance

Page 37: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

37

NGVCK and G2 Graphs

Simple Substitution Distance

Page 38: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

38

MWOR Score Graphs

Simple Substitution Distance

Page 39: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

39

MWOR ROC Curves

Simple Substitution Distance

Page 40: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

40

MWOR AUC Statistics

Simple Substitution Distance

Page 41: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

41

Efficiency

Simple Substitution Distance

Page 42: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

42

Conclusions+ Simple substitution score, good

results for challenging metamorphic viruses

+ Scoring is fast and efficient+ Applicable to other types of

malware- Requires opcodes

Simple Substitution Distance

Page 43: Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

43

References G. Shanmugam, R.M. Low, and M.

Stamp, Simple substitution distance and metamorphic detection, Journal of Computer Virology and Hacking Techniques, 9(3):159-170, 2013

Simple Substitution Distance