saumya debray the university of arizona tucson, az 85721

19
UNDERSTANDING SOFTWARE THAT DOESN’T WANT TO BE UNDERSTOOD REVERSE ENGINEERING OBFUSCATED BINARIES Saumya Debray The University of Arizona Tucson, AZ 85721

Upload: trula

Post on 22-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Saumya Debray The University of Arizona Tucson, AZ 85721. Understanding software that doesn’t want to be understood Reverse engineering obfuscated BINARIE s. The Problem. Rapid analysis and understanding of malware code essential for swift response to new threats - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Saumya Debray The University of Arizona Tucson, AZ  85721

UNDERSTANDING SOFTWARE THAT DOESN’T WANT TO BE UNDERSTOOD REVERSE ENGINEERING OBFUSCATED BINARIES

Saumya DebrayThe University of ArizonaTucson, AZ 85721

Page 2: Saumya Debray The University of Arizona Tucson, AZ  85721

The Problem

Rapid analysis and understanding of malware code essential for swift response to new threats‒ Malicious software are usually heavily

obfuscated against analysis Existing approaches to reverse engineering

such code are primitive‒ not a lot of high-level tool support‒ requires a lot of manual intervention‒ slow, cumbersome, potentially error-prone

Delays development of countermeasures

Page 3: Saumya Debray The University of Arizona Tucson, AZ  85721

Goals

Develop automated techniques for analysis and reverse engineering of obfuscated binaries

semantics-based‒ output is functionally equivalent to, but simpler

than, the input program

generality‒ should work on any obfuscation

even ones we haven’t thought of yet!‒ should minimize assumptions about obfuscations

Page 4: Saumya Debray The University of Arizona Tucson, AZ  85721

Challenges

can’t make assumptions about obfuscations‒ what do we leverage for deobfuscation?‒ distinguishing code we care about from code we

don’t how do we know which instructions we care about?

scale‒ “needle in haystack”

no. of instructions executed increases by 270x (VMprotect) to 4300x (Themida) [Lau 2008]

anti-analysis defenses‒ runtime unpacking‒ anti-emulation, anti-debug checks

Page 5: Saumya Debray The University of Arizona Tucson, AZ  85721

Our Approach

no obfuscation-specific assumptions‒ treat programs as input-to-output transformations‒ use semantics-preserving transformations to

simplify execution traces dynamic analysis to handle runtime

unpacking

Taint analysis

(bit-level)

Control flow reconstructi

on

Semantics-preserving

transformations

inpu

t p

rogr

am

cont

rol

flow

gr

aph

map flow of valuesfrom input to output

simplify logic ofinput-to-outputtransformation

reconstruct logic ofsimplified computation

Page 6: Saumya Debray The University of Arizona Tucson, AZ  85721

Ex 1:Emulation-based Obfuscation

examination of the code reveals only the emulator’s logic‒ actual program logic embedded in byte code

lots of “chaff” during execution‒ separating emulator logic from payload logic

tricky

emulators can be nested

Obfuscatorinput program

random seed

bytecode logic (data)

emulator (code)

mutation engine

Page 7: Saumya Debray The University of Arizona Tucson, AZ  85721

Ex 2:Return-Oriented Programs (ROP)

Originally designed to bypass anti-code-injection defenses‒ stitches together existing code fragments

( “gadgets” ), e.g., in system libraries Logic can be difficult to discern

‒ gadgets are typically scattered across many different functions and/or libraries

‒ gadgets can overlap in memory in weird ways‒ control flow structures (if-else, loops, function

calls) are typically implemented using non-standard idioms

Page 8: Saumya Debray The University of Arizona Tucson, AZ  85721

Example 1 (emulation-obfuscation)

factorial (Themida)

Page 9: Saumya Debray The University of Arizona Tucson, AZ  85721

Example 2 (ROP)

o

original ROP

factorial

Page 10: Saumya Debray The University of Arizona Tucson, AZ  85721

Interactions between ObfuscationsExample: Unpacking + Emulation

unpa

ckun

pack

output

output

input

input

instructions “tainted” as propagating values from input to output

execution traceinput-to-output computation(further simplified)

used

to c

onst

ruct

con

trol fl

ow g

raph

Page 11: Saumya Debray The University of Arizona Tucson, AZ  85721

Results

Ex. 1. binary search : Themidaoriginal obfuscated (cropped) deobfuscated

Page 12: Saumya Debray The University of Arizona Tucson, AZ  85721

Results

Ex. 2. Hunatcha (drive infection code) : ExeCryptororiginal obfuscated (cropped) deobfuscated

Page 13: Saumya Debray The University of Arizona Tucson, AZ  85721

Results

Ex. 3. fibonacci: ROPoriginal obfuscated deobfuscated

Page 14: Saumya Debray The University of Arizona Tucson, AZ  85721

Results Ex. 4. Win32/Kryptik.OHY: Code Virtualizer

obfuscated deobfuscated

multiple layers of runtime code generationunpacking

code

initial unpacker is emulation-obfuscated

the CFG shown materializes incrementally

Page 15: Saumya Debray The University of Arizona Tucson, AZ  85721

Results: CFG Similarity

0

10

20

30

40

50

60

70

80

90

100

OBFUSCATEDDEOBFUSCATED

Programs

Sim

ilari

ty w

ith

orig

inal

pro

gram

(%

)

Page 16: Saumya Debray The University of Arizona Tucson, AZ  85721

Lessons and Issues

Static vs. dynamic analysis‒ multiple layers of runtime code

generation/unpacking limits utility of static analysis

‒ dynamic analysis can run into problems of scale O(n2) algorithms impractical ; even O(n log n) can be

problematic trade memory space for execution time/complexity code coverage — multi-path exploration?

Taint propagation‒ byte/word-level analyses may not be precise

enough we use (enhanced) bit-level taint propagation

Simplified trace → CFG: NP-hard‒ semantic considerations?

Page 17: Saumya Debray The University of Arizona Tucson, AZ  85721

Conclusions

Rapid analysis and understanding of malware code essential for swift response to new threats‒ need to deal with advanced code obfuscations‒ obfuscation-specific solutions tend to be fragile

We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used‒ promising results on obfuscators (e.g.,

Themida) not handled by prior research

Page 18: Saumya Debray The University of Arizona Tucson, AZ  85721

ADDITIONAL MATERIAL

Page 19: Saumya Debray The University of Arizona Tucson, AZ  85721

Semantics-based simplification

Quasi-invariant locations: locations that have the same value at each use.

Our transformations (currently):‒ Arithmetic simplification

adaptation of constant folding to execution traces consider quasi-invariant locations as constants controlled to avoid over-simplification

‒ Data movement simplification use pattern-driven rules to identify and simplify data

movement.‒ Dead code elimination

need to consider implicit destinations, e.g., condition code flags.