ida and obfuscated code hex-rays ilfak guilfanov

44
IDA and obfuscated code Hex-Rays Ilfak Guilfanov

Upload: osborne-bishop

Post on 17-Dec-2015

305 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

IDA and obfuscated codeHex-RaysIlfak Guilfanov

Page 2: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

2

Presentation Outline

Is obfuscated code a problem for IDA Pro?IDA Pro expects nice proper code

A lost battle?At the first sight, yes

Solutions existThey are numerous...

Future developmentYour feedback

Online copy of this presentation is available at http://www.hex-rays.com/idapro/ppt/caro_obfuscation.ppt

Page 3: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

3

Sample obfuscated code

IDA is a static analysis tool and it makes many assumptions about the input codeWhen these assumptions are violated, the analysis goes wrongAn extremely simple case, call instructions are expected to return to the next instruction:

problem

The solution will be presented later...

Page 4: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

4

Obfuscation categories

RedundancyBlow the code size: code cleaning is necessary

CamouflageHide & seek: the seeker is to win

Anti-debugger tricksTricks can be learned even by old dogs

Since it is “just” obfuscation, a determined reverse engineer will eventually overcome it

Page 5: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

5

Redundancy

Instructions with no effectUseless jumpsComplex computations with a constant result Code duplication

Page 6: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

6

Instructions with no effect

In fact CL is zero

Page 7: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

7

Instructions with no effect - countermeasures

Replace them by 'nop'sCollapse regions of useless instructions into one line (select useless instructions, then View, Hide)

Ideally, a plugin to clean up the code would be nice. The Hex-Rays decompiler ignores useless instructions because it simply removes all dead code but it can not handle obfuscated code well – expect improvements in this direction

Page 8: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

8

Useless jumps

Text view is pretty useless:

Page 9: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

9

Useless jumps

Graph view is slightly better:

A plugin to clean the graph and combine adjacent nodes would be really useful (can be done without modifying the database)

Page 10: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

10

Graph view and plugins

Graphs generated by IDA can be modified by a plugin on the fly – just hook to grcode_changed_graph eventThis allows for improving the graph. Some ideas:

Combine sequential nodes into oneHide dead code pathsRemove dead edgesAdd annotations to graph nodes/edgesAutomatically recognize and collapse patterns (e.g.strlen)Local optimization (within a node; constant folding, etc)

All this can be really useful for obfuscated code!

Page 11: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

11

Constant result calculations

Some constant calculations can be easily handled

Ctrl-R

Page 12: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

12

When there are too many offsets...

The answer is obvious – write a script or a plugin :)Here's very simple one-line script:OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0)

To make your life even easier, you may assign a script to a hotkey, press Shift-F2 and enter:

This trick and many others are explained on http://www.xs4all.nl/~itsme/projects/disassemblers/ida.html

AddHotkey("w", "make_ebp_offset");}

static make_ebp_offset(){ OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0);

Page 13: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

13

What if there are thousands of such offsets?...

Improve the script to check all instructions for the desired pattern. Here's how to organize a loop over all instructions:

auto ea, ea2;ea2 = MaxEA();for ( ea=MinEA(); ea < ea2; ea=NextHead(ea, ea2) ){ if ( !isCode(GetFlags(ea)) ) continue; if ( GetMnem(ea) == "mov" && GetOpnd(ea, 0) == "ebp" ) Message("%a: found mov ebp!\n", ea);}

Page 14: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

14

What if these offsets appear and vanish dynamically?

Well, then you have to create a plugin. It would:Recognize the desired patternModify the database (create an offset, code, add cmt, etc)

Such plugins are fully automaticThey hook to analysis events (frequently to custom_emu)This is the most powerful technique but, alas, it requires DLL programming in C and using the SDKJust three wishes for your plugins:

Maybe a switch to turn your plugin off is a good ideaTry to be user-friendly (for example, check if there is a comment before calling set_cmt; otherwise you may overwrite a user-defined comment)Do not exit to OS in the case of errors

Page 15: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

15

Constant calculations – some ideas

Create a script or plugin to:Add calculation results as comments (what about a script that traces the application and adds register values as comments for each instruction?)Modify the database and simplify instructions

Page 16: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

16

Camouflage

Opaque predicatesProprietary virtual machineEncryption/compressionMessage-driven systemsNo direct references – PIC (position independent code) codeHidden execution flow using SEHRootkit techniquesHidden entry point (TLS callbacks, entry point in the resources section or in the header)

Page 17: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

17

Opaque predicates

The definition says that opaque predicate is a predicate (an expression that evaluates to either "true" or "false") for which the outcome is known by the programmer a priori, but which, for a variety of reasons, still needs to be evaluated at run timeIn fact, some expressions evaluate to any integer value:

GetLastError returns 0x57 (Invalid Parameter)

Page 18: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

18

Opaque predicates

They may come in many varieties. Since we can not determine the outcome statically, we have to find it out ourselves and

Inform IDA about the predicate outcomePrune dead code paths and simplify the code

Working on graph view or pseudocode is easier

Automate this? How?

Future versions of IDA/Hex-Rays will offer some solutionsInteractivity and extendibility helps

Page 19: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

19

Proprietary virtual machine

Many implementations use this obfuscation methodRequires reverse engineering the virtual machineExamples:

Themida & Code Virtualizer (http://www.oreans.com/)Various malware

In general case, building a processor module for the VM is requiredLet me show you a simple case

Page 20: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

20

Bagle malware case

This mass mailer contains the following code sequence:

Page 21: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

21

Bagle - opcodes

Opcode handlers are very simple, I renamed them:

Page 22: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

22

Bagle – opcode table

After renaming all handlers the opcode table was:

Page 23: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

23

Bagle – create opcode enumeration

The following script created a enumeration for all VM opcodes based on the handler names:

Page 24: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

24

Bagle – enumeration ready

We can use this enumeration in the disassembly nowJust declare an array of bytes and convert them to VM_CODESAll this without quitting IDA (in fact, I was in the middle of a debugging session since there was another layer of protection before the VM)

Page 25: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

25

Bagle – virtual machine readable

Create an array of bytes, declare them as VM_CODES:

Page 26: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

26

Bagle – VM logic visible

The logic of the VM program became visible but there were immediate constants in the code that required manual intervention:

Page 27: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

27

Bagle – VM decoding automated

The following script solve the problem:

Page 28: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

28

Bagle – comfortable analysis of VM

After assigning a hotkey to the previous script, it was almost the same as having a processor module for the VMHowever, another level of deobfuscation is required(0x63FE34B2 ^ 0x9C01CB4D = 0xFFFFFFFF)

Page 29: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

29

VM - summary

We have toAnalyze VM opcodesGive them meaningful, descriptive namesIn simple cases, simple enumeration will do the jobIn complex cases, a processor module has to be developed

It is not _that_ difficult after all ;)

Rolf Rolles created a processor module for a VM:http://www.openrce.org/articles/full_view/28

Page 30: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

30

Executable packing

Plethora of packing methods, good and badManual unpacking is always possible; automatic unpacking would be idealThere are sample scripts and plugins in IDA

uunp – proof of concept unpacker plugin, exists as an IDC script as wellunpack – another sample unpacker

IDA stayed away from this arms raceThere are many other solutions available (unpackers, process dumpers, etc)

Page 31: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

31

Executable packing - approaches

Static analysis too time consuming requires tedious manual work

Dynamic analysis (debugger)much faster requires special sandboxed environmentvulnerable to anti-debugger tricks

Code emulation a good idea any widespread emulator will be attackedemulation imperfections are a problem

No ideal solution...

Page 32: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

32

Encryption

Methods vary from simple XOR encryption to serious encryption schemes like AES, Blowfish, etcSince the key must be present to run the executable, the strength of the encryption method does not matterIdeally we just let the application decrypt itself and then take a memory snapshotIf only part of the executable is decrypted at a time, then we need to automate the process of taking memory snapshots

Page 33: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

33

Position independent code

No fixed addresses means no xrefsAnalysis is harder but user-defined offsets can help

Page 34: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

34

Anti-debugging tricks

I'm sure you know better since you are the practitioners :)IDA related:

Its default settings are not good for hostile code debuggingExceptions are handled by the debugger – change it in the debugger settings

Just two simple methods

Page 35: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

35

Use tracing to find anti-debugging tricks

Tracing is slow but it may be used to find why/when/how the process misbehavesSample trace log from a naïve code:

Page 36: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

36

Simple method to neutralize found tricks

Use “conditional” breakpoint to neutralize tricks encountered while single-steppingThe breakpoint condition for the call instruction is

ip=ip+2Breakpoint conditions may call all defined IDC functions (including user-defined ones) – can be used for logging and changing the application behavior

Page 37: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

37

Debugger – current state

IDA debugger advantagesThe annotated database is available during debuggingAll facilities continue to work: FLIRT signatures, function prototypes and argument names, structures, enumerations, your scripts and plugins, etc...ScriptableAvailable on multiple platforms (+remote debugging)

ShortcomingsSlow operationMultithreaded applications poorly handledOnly application level debugging is available

We continue to work on the shortcomingsFuture versions will be more fit for hostile code analysis

Page 38: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

38

Debugger - ideas

A debugger plugin to configure the 'stealth' modeExceptions are passed to the applicationCalls to IsDebuggerPresent, NtSetInformationThread and similar functions are intercepted

Emulating debugger moduleA 'stealth' debugger module

Do not use the standard debugger interface (CreateProcess/WaitForDebugEvent)Inject a debugger DLL into the process and communicate with it (the must-have functionality is breakpoint handling and memory access)

Higher level debuggingSkip hidden code areas, group nodes in the graph viewSource level debugging using the pseudocode view

Page 39: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

39

Summary

Obfuscation methods vary, no single receipt for all casesThe key is to be able to represent the code nicely on the screenThe problem is generic: what to do if IDA displays things not the way I want?The answer is: modify the output!

Use interactive commands, menus, etcRepresent data in meaningful wayHide irrelevant informationPatch the database and simplify it

Create scripts, plugins, processor modules to avoid routine work

Page 40: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

40

The obfuscating call instruction

The function returns a few bytes further that it would normally:

Page 41: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

41

Example: solution to obfuscating call

The idea: intercept emulation of calls to “ex_obfuscating” and create correct xrefs Just a few lines of code (unfortunately, a plugin)Can be made more complex if necessaryThe source code of the sample plugin can be found at http://www.hexblog.com/ida_pro/files/ex_deobfuscate.zipSee the next slide for the essential part of the plugin

Page 42: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

42

Plugin to handle weird call instructions

Page 43: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

43

Deobfuscated code

Note the arrow on the left side of the listingGraph could be simplified further by a plugin

Page 44: IDA and obfuscated code Hex-Rays Ilfak Guilfanov

44

The “thank you” slide

Thank you for your attention!Questions?