automatic reverse engineering of malware emulators
DESCRIPTION
Automatic Reverse Engineering of Malware Emulators. Sharif, M., Lanzi, A., Giffin, J., and Lee, W. Georgia Institute of Technology, S&P 2009. Presented by WANG Zhi. Outline. Emulation Obfuscation Emulator Model Abstract Variable Binding Identifying Candidate VPC - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/1.jpg)
Automatic Reverse Engineering of Malware Emulators
Sharif, M., Lanzi, A., Giffin, J., and Lee, W. Georgia Institute of Technology, S&P 2009.
Presented by WANG Zhi
![Page 2: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/2.jpg)
Outline
Emulation Obfuscation Emulator Model Abstract Variable Binding Identifying Candidate VPC Identifying Emulation Behavior Extracting Syntax and Semantics
![Page 3: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/3.jpg)
Emulation Obfuscation
The term emulation generally expresses the support of a binary instruction set architecture (ISA) that is different from that supported by CPU.
Jave(JVM), Javascript, Perl, Python…
![Page 4: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/4.jpg)
Emulation Obfuscation
Using emulation for obfuscation
![Page 5: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/5.jpg)
Emulation Obfuscation
Without knowledge of the bytecode language, even the location of bytecodes in the obfuscated program.
The key challenges in analyzing emulation obfuscation are how to extract the bytecode trace and the syntax and semantics of the bytecode instructions from the emulated program trace.
![Page 6: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/6.jpg)
Emulator Model
A decode-dispatch based emulator
![Page 7: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/7.jpg)
Emulator Model
The decode phase fetches opcode.
The dispatch phase selects appropriate handling routine.
The execution phase perform the dispatched handling routine
![Page 8: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/8.jpg)
Reverse Engineering of Emulation
In this paper, they developed algorithms to extract the syntax and semantics of unknown bytecode based upon the execution behavior of the decode-dispatch emulator.
![Page 9: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/9.jpg)
Abstract Variable Binding
The compiler translate source code into low-level CPU instructions while the high-level variables are assigned to memory location or registers.
In the x86 instruction trace, there are no high-level language variables.
To identify variables in the execution trace of the emulator is the first work.
![Page 10: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/10.jpg)
Abstract Variable Binding
In this paper, memory locations are used to represent high-level variables, which are called abstract variables..
In the x86 architecture, memory read and write operations should use register indirect addressing.
Source Code: instruction = bytecode[VPC] Pseudo-x86 Instructions:
eax <= [VPC]instruction <= [eax]
![Page 11: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/11.jpg)
Abstract Variable Binding
The first instruction loads the value of the abstract variable VPC to eax. In other word , binds eax to VPC, and this binding is propagated in the next instruction.
It use a forward and backward data-flow analysis to identify the abstract variables and their propogation.
![Page 12: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/12.jpg)
Forward Binding
Forward binding identifies abstract variable from memory read operations.
![Page 13: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/13.jpg)
Forward Binding
Forward binding propagates abstract variable bindings when operations compute an update to its previous value.
The outputs is a mapping describing bindings from memory read operations to abstract variables.
![Page 14: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/14.jpg)
Backward Binding
Backward binding operates in the reverse order of forward bindings. It determines abstract variables from memory write operations
![Page 15: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/15.jpg)
Identifying Dependent Abstract Variables
It identify dependencies among abstract variables by tracking the data flow from one abstract variable to another.
![Page 16: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/16.jpg)
Lifetime Abstract Variables
It use memory locations as abstract variables, but the same memory location may be used for different variables at different points of execution.
The abstract variables on the stack or heap have shorter lifetime than those in the static data region
The Lifetime depends on the allocation and deallocation operations.
![Page 17: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/17.jpg)
Identifying Candidate VPCs
VPC is a virtual program counter, like program counter or instruction pointer register.
The emulator fetches bytecode instructions from memory address specified by VPC
![Page 18: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/18.jpg)
Identifying Candidate VPCs
It clusters all read operations to n clusters using a simple similarity metric that check whether they have common abstract variables.
The fetching behavior’s read operations should be contained within one of the n cluster because VPC is the common abstract variable.
![Page 19: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/19.jpg)
Identifying Emulation Behavior
Decode-dispatch emulators have fundamental execution properties: a main loop with a bytecode fetch through the VPC, decoding of the opcode within the bytecode, dispatch to an opcode handler, and a change to the VPC value.
![Page 20: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/20.jpg)
Identifying Emulation Behavior
They use a standard loop detection methods to detect the emulator’s main loop.
They analyze the abstract variable propagation to find decoding, dispatching and execution of bytecode behaviors.
![Page 21: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/21.jpg)
Identifying Emulation Behavior
They use multi-level dynamic tainting to analyze the data flow.
There are four taint label: fetch, decode, dispatch and execute.
![Page 22: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/22.jpg)
Identifying Emulation Behavior
If a abstract variable is read from memory, they mark the instruction taint label fetch.
Once the analyzer detects a dispatch-like behavior which is a control-flow transfer instruction and its target address is a taint variable, it will mark the instruction taint label dispatch, and mark the target of the contol-transfer as a probable execute routine
![Page 23: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/23.jpg)
Extracting Syntax and Semantics
Once the analyzer identifies the emulation behavior, it reverse engineers each iteration of the emulator loop to extract the syntax and semantics of the bytecode instruction executed on that iteration.
![Page 24: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/24.jpg)
Extracting Syntax and Semantics
The syntactic information shows how bytecode instructions are parsed into opcodes and operands.
The semantic information consisting of native instructions that carry out the actions of the bytecode instructions.
![Page 25: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/25.jpg)
Extracting Syntax and Semantics
To identify the opcode part, they use taint analysis to determine which portion of the bytecode instruction was used for dispatch behavior.
The execution routine invoked by emulator for the opcode encodes the semantics.
![Page 26: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/26.jpg)
Extracting Syntax and Semantics
By determining how the parse the bytecode and by locating control-flow transfer opcodes, a CFG for the bytecode can be constructed.
The CFG structure provides a foundation for subsequent malware analysis.
![Page 27: Automatic Reverse Engineering of Malware Emulators](https://reader036.vdocument.in/reader036/viewer/2022081503/56813f97550346895daa867b/html5/thumbnails/27.jpg)
Thank you