a model for self-modifying code bertrand anckaert, matias madou and koen de bosschere 8 th...

42
A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006

Upload: rosaline-suzan-french

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

A Model for Self-Modifying Code

Bertrand Anckaert, Matias Madou andKoen De Bosschere

8th Information Hiding Conference, July 11th 2006

2

o Problem for Reverse-Engineeringo Used for Hiding Program Internals

• Software Protectiono Copyright Protection Mechanismso Secret Algorithmso …

• Malicious intent of viruses

o Program Optimization

Self-Modifying Code

3

Scope

010010101101110101011111101101101101101011001100110011011101010111001101101010101111101111110111000001110010011101101101101101010110101 001001010100 011101011111

010010101101110101011101101101101101101011001100110011011101010111001101101010101111101111110111000001110010011101101101 101101010110101 001011010100 011101011111

Focus: malicious host paradigm

Not: malicious code paradigm

known

4

Goal

o Internal Representation

o Construction and Deconstruction

o Accurate and Conservative

o Analyses and Transformations

5

o Introductiono Running Example

o Internal Representationo Construction and Deconstructiono Analyses and Transformations

o Applications

Overview

Accurate and Conservative

Accurate and Conservative

6

Example: ISA

Assembly Binary Semantics

movb value to 0xc6 value to set byte at address to to value value

inc reg 0x40 reg increment register reg

dec reg 0x48 reg decrement register reg

push reg 0xff reg push register reg

jmp to 0x0c to jump to address to (absolute)

7

Example: Introduction

Address Binary Assembly

0x00x30x50x80xa0xc

c6 0c 0840 01c6 0c 0540 03ff 0248 01

movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx

8

Example: Trace

movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx

1

movb 0xc 0x8inc %ebxmovb 0xc 0x5jmp 0x3push %ecxdec %ebx

movb 0xc 0x8inc %ebxjmp 0xcjmp 0x3push %ecxdec %ebx

2

3

4

5

6

7

=inc %ebx

2) inc %ebx 3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx

Trace: 1) movb 0xc 0x8

1

3

9

o Scopeo Running Exampleo Internal Representation

• Superposition of CFGs• Codebytes• Codebyte Conditional Edges• Consumption of Codebyte Values

o Construction and Deconstructiono Analyses and Transformationso Applications

Overview

10

CFG for Traditional Code

o One of the most important internal representations for traditional code• Well-understood how to:

o construct and deconstructo accurate and conservativeo analysis and transformations

• representation of a superset of all possible executions

11not conservative

Traditional CFG Construction for SMC

movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx

inc %ebxmovb 0xc 0x5jmp 0x3push %ecxdec %ebx dec %ebx

push %ecxjmp 0x3

inc %ebxjmp 0xc

movb 0xc 0x8 movb 0xc 0x8

1) movb 0xc 0x82) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx

12,53

7

1

2,534

77

4

2,56

1

not a supersetnot accurate

Unreachable Code Elimination

12

Example: Superposition of CFGs

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

dec %ebx

jmp 0xc

inc %edx

push %ecx2) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx

1) movb 0xc 0x8

1

2,5

3

4

6

7

13

Contains CFG 1

movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

14

Contains CFG 2

inc %ebxmovb 0xc 0x5jmp 0x3

push %ecxdec %ebx

movb 0xc 0x8movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

15

Contains CFG 3

dec %ebx

push %ecx

jmp 0x3

inc %ebxjmp 0xc

movb 0xc 0x8movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

16

Superposition of CFGs

o Represents a superset of all possible executions

o But:• how do we linearize a graph with multiple

outgoing/incoming fall-through paths?• how do we analyze what states the program

can be in at a given program point?• …

Extensions

17

o Scopeo Running Exampleo Internal Representation

• Superposition of CFGs• CodeBytes• CodeByte Conditional Edges• Consumption of CodeByte Values

o Construction and Deconstructiono Analyses and Transformationso Applications

Overview

18

CodeByte

0x5

c6

0c

identifier (address)

statesinitial state

19

Extension 1: CodeBytes

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

0x340

0x401

0x60c

0x705

0xaff

0xb02

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

20

Extension 2: CodeByte Conditional Edges

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

0x340

0x401

0x60c

0x705

0xaff

0xb02

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

*(0x5)==c6

*(0x8)==0c

*(0x5)==0c

*(0x8)==40

21

Extension 3: Consumption of CodeBytes

o A codebyte is read when it is interpreted as (part of) an instruction by the CPU

o Important for data analyses, such as liveness analysis

22

Traditional Code vs. Self-Modifying Code

o Traditional Code• No Overlap • Not Self-Inspecting• Not Self-Modifying

o Special case of self-modifying code. Extensions can be omitted because:• Can be easily linearized as instructions do not overlap• Target locations of control transfers can be in only one

state• Result of data analyses on code is trivial as the code is

constant

23

o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformationso Applications

Overview

24

Construction

o Requires that we know:• Targets of control flow• Which instructions write what where

o Not a problem in the malicious host paradigm

o In the malicious code paradigm(Future Work):• Observing dynamic execution• Static extension

25

Linearization

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

push %ecx

dec %ebx

jmp 0xc

0x340

0x401

0x60c

0x705

0xaff

0xb02

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

c6 0c 0840 01c6 0c 0540 03ff 0248 01

26

Example: Introduction

Address Binary Assembly

0x00x30x50x80xa0xc

c6 0c 0840 01c6 0c 0540 03ff 0248 01

movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx

27

o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformations

• Constant Propagation• Unreachable Code(Byte) Elimination• Liveness Analysis• Loop Unrolling

o Applications

Overview

28

*(0x8)==40

Constant Propagation

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

0x340

0x401

0x60c

0x705

0xaff

0xb02

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

*(0x5)==c6

*(0x8)==0c

*(0x5)==0c

29

Unreachable Code(Byte) Elimination

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

0x340

0x401

0x60c

0x705

0xaff

0xb02

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

*(0x5)==c6

*(0x8)==0c

*(0x5)==0c

30

Liveness Analysis

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

dec %ebx

jmp 0xc

0x340

0x401

0x60c

0x705

0x903

0xc48

0xd01

0x8400c

0x5c60c

0x0c6 0x1

0c0x208

*(0x5)==c6

*(0x8)==0c

*(0x5)==0c

0x8

31

Idempotent Instruction Removal

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

dec %ebx

jmp 0xc

0x340

0x401

0x60c

0x705

0x903

0xc48

0xd01

400c

0x5c60c

0x0c6 0x1

0c0x208

*(0x5)==c6

*(0x8)==0c

*(0x5)==0c

0x8

32

1) movb 0xc 0x82) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx

Loop Unrolling and …

inc %ebx

_cc60c

_e0c

jmp 0xc

dec %ebx

jmp 0xc

inc %ebx

movb 0xc 0x5movb 0xc _c

movb 0xc 0x5movb 0xc _cjmp 0x3

_a40 _b

01

_f0c

_g0c

_h_c_d

0c

0x5c60c

0x705

0x60c

0x340

0x401

_i0c

_j0c

_k_c

0x340

0x401

*(_c)==0c *(_c)==c6

*(0x5)==0c *(0x5)==c6

=

0xc48

0xd01

33

o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformationso Applications

Overview

34

Applications

o Outlining of almost identical code snippets through one-bit modifiers

o Overlapping similar functions through diff scripts

o Significant slowdown (factor 1.15 up to 3)

35

Almost Identical Code Snippets

push 0xa804245c

pop %ebx

ret

0x068

0x15c

0x4a8

0x55b

0x6c3

0x304

mov 4(%esp),%ebx

test 0x5b,%al

ret

0x224

0x08b

0x15c

0x4a8

0x55b

0x6c3

0x304

0x224

36

Merged Code Snippets

push 0xa804245c

pop %ebx

0x15c

0x4a8

0x55b

0x6c3

0x304

mov 4(%esp),%ebx

test 0x5b,%al

0x224

0x08b68

ret

movb 0x68 0x0

jmp 0x0movb 0x8b 0x0

jmp 0x0

37

Conclusion

o Superposition of different CFGso Three extensions

• CodeByte datastructure• CodeByte conditional edges• Consumption of CodeBytes

Internal Representation Allows for:• Construction (limited) and Deconstruction• Conservative and Accurate• Analyses and Transformations (iterative)

Questions?Presentation: http://www.elis.ugent.be/~banckaer

Tool: http://www.elis.ugent.be/diablo

39

Linearization

o Chains of instructions

Chains of codebyteso Codebytes c and d must be concatenated:

• c and d are successive codebytes in an instruction

• c is the last codebyte of instruction I and d is the first codebyte of instruction J and I and J are successive instructions in a basic block

• c is the last codebyte of basic block A and d is the first codebyte of basic block B and A and B are connected by a fall-through path

40

Example: Superposition of CFGs

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

41

Example: Superposition of CFGs

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx

42

Example: Superposition of CFGs

movb 0xc 0x8

inc %ebx

jmp 0x3

movb 0xc 0x5

inc %edx

dec %ebx

jmp 0xc

push %ecx