esil - universal il (intermediate language) for radare2
TRANSCRIPT
esil - universal ilESIL - Intermediate Language for radare2 toolset
Anton Kochkov (@akochkov)
June 23, 2016
ZeroNights 11-2015
anton kochkov
• Moscow, Russia
• Love Reverse Engineering, foreign languages and travel
• Member of R2 crew, radare2 evangelist
• Security Code Ltd.
2
intermediate languages
what is intermediate language?
• Intermediate language is the language of an abstract machinedesigned to aid in the analysis of computer programs. il-wikipedia
• Heavily used for academic research and real world tools
• Vital for decompilation process
• Base for various kind of applications - SMT, AEG, AEP, etc
4
reil1
• Invented by Zynamics company
• BinNavi, BinDiff were both based on top of REIL
• x86, ARM, PowerPC architectures supported
• VM - Infinite memory
• VM - Infinite range of registers
• Missing Floating Point
• Written in Java
1reil.
5
reil
• 17 instructions
• Register aliases (eax, ebx, r0, . . . )2
2reil-zynamics.
6
bap
• BAP - Binary Analysis Platform3
• IL itself called BIL
• Well-maintained framework, various tools included
• Integration with another tools like: TEMU, libVEX, IDA Pro, Qira,. . .
• Targeted for x86, ARM
• Missing Floating Point
3bap-handbook.
7
bitblaze (vineil/vex)
• BitBlaze4 - also a platform, like BAP
• Uses two intermediate languages
• Vine IL - “low-level” language
• VEX IL (libVEX from valgrind) - “high-level” language
• Written in OCaml + C++
4bitblaze.
8
vine il56
• Implicitly require description of all side-effects
• Quite similar to ESIL
• Very useful for direct code emulation
• Too much unused information
5vine-manual.6vine-vuln-analysis.
9
vex il
• Infinite memory
• Infinite range of registers
• Support types
• “Variable scope” support
• Used in Valgrind and a few other programs
• Well-tested and actively maintained
10
rreil, openreil, mail
• RREIL7 - modern and flexible alternative to REIL
• RREIL - supports types
• RREIL - unique conceipt of “domains”
• MAIL - IL, constructed specifically for Malware analysis
• MAIL - the only IL, which allows polymorph programs
• RREIL and MAIL - both lacks Floating Point support
7rreil.
11
rreil, openreil, mail
• OpenREIL8 - reinvention and attempt to spruce up REIL
• OpenREIL - self-contained and ready to use framework, like BAP
• OpenREIL has major differencies from the original REIL
• Based on libVEX and has embedded support of SMT-solving
8openreil.
12
esil - what’s different and what’s com-mon
short description
• Evaluable Strings Intermediate Language9
• Based on RPN (Reverse Polish Notation)(for the sake of speed)
• Designed for evaluation and emulation, not human-reading
• Low-level, pretty much alike Vine IL
• Small set of the instructions
• Implicit specification of all side-effects for each instruction
9r2esil.
14
short description
• Designed with a wide range of different architectures in mind
• Infinite memory
• Infinite set of registers
• Register aliases (“native” names, like “eax” or “cpsr”)
• Ability to call external functions (+syscalls)
• Ability to implement “custom ops” easily
• Without Floating Point support (planned, though)
15
esil operands
Table 1: ESIL Operands10
ESIL Opcode Operands Name Desription$ src Syscall syscall$$ src Instruction address Get address of current instruction== src,dst Compare v = dst - src ; update_eflags(v)< src,dst Smaller stack = (dst <src)<= src,dst Smaller or Equal stack = (dst <= src)> src,dst Bigger stack = (dst >src)|= src,reg OR eq reg = reg | src
10r2esil-instruction-set.
16
practical usage
radare
11
11r2advert.
18
radare2 tools
• rax2
• rabin2
• rasm2
• radiff2
• rafind2
• rahash2
• radare2
• r2pm
• rarun2/ragg2/ragg2-cc
19
1 command <—>1 reverse-engineering’notion
1. Every character of the command has some meaning (w = write, p =print)
2. Usually they’re simple abbreviations pdf = p <->print d<->disassemble f <->function
3. Short usage message for each command can be printed with cmd?,e.g. pdf?,?, ???, ???, ?$?, ?@?
20
radare2 — important commands of cli-mode
1. r2 -A or r2 + aaaa : run analysis
2. s : seek to the address or flag
3. pdf : print disassembly for function
4. af? : perform function analysis
5. ax? : do analysis for XREF
6. /? : various kinds of search
7. ps? : print string
8. C? : comments
9. w? : write bytes (hex, assembly, etc)
21
radare2 — visual mode
radare2 — important commands of visual mode
1. V? or just ? : Hotkeys help
2. p/P : circle between various visual modes
3. hjkl/arrows navigation
4. o : go to the offset/address
5. e : show all config variables
6. v : show the functions list
7. _ : HUD
8. V : ASCII Graph
9. 0-9 : jump to the corresponding function
10. u : Undo
23
emulation using esil
• ae* - evaluate and show r2 commands
• aei - VM initialization
• aeim - VM memory/stack setup
• aeip - to set IP (Instruction Pointer) for VM
• aes - step in ESIL emulation
• aec[u] - continue [until]
• aef - emulate whole function by name
24
emulation using esil
• DEMO
25
embedded controller - 8051 - esil vm12
• r2 -a 8051 ite_it8502.rom
• . ite_it8502.r2
• e io.cache=true to cache IO while emulating
• run aei
• run aeim
• run aeip to start from current IP
• aecu [addr] to emulate until IP = [addr]
12r2esiltv.
26
full fledged emulation in vm
• Allows to start ESIL VM with predefined properties
• Allows to run the code (e.g. unpack) in this VM
• Allows to set callbacks on some opcodes
• Allows to translate syscalls into real ones (optional)
• Good example - unpacking Baleful code13
13esil-baleful.
27
using esil emulation in analysis
• Using emulation of the code to find indirect jumps
• Using emulation to find some strings addresses
• Started by aae
• aae is a part of aaaa command
28
using background emulation to improve disassembly output
• Shows possible registers and memory values in comments
• Run ESIL VM with default properties in background
• Show “likely/unlikely” for conditional jumps
• e asm.emu=true
29
using background emulation to improve disassembly output
• DEMO
30
converting into another ils - openreil
• OpenREIL - actively maintained framework
• Ability to perform symbolic execution using SMT solver
• Radare2 has a special command to translate ESIL into OpenREIL
• aetr
31
converting into another ils - openreil
• DEMO
32
embedded controller - 8051 - esil2reil
• r2 -a 8051 ite_it8502.rom
• . ite_it8502.r2
• run pae 36 to show ESIL of the function ’set_SMBus_frequency’
• run aetr `pae 36` to translate ESIL into REIL14
• Save this output into the file/pipe and send as OpenREIL input
• Could be easily automated using r2pipe script
14openreil.
33
radeco il and radeco decompiler
esil -> radeco16
• Uses ESIL as input
• Request more metadata from radare2 (xrefs, functions, etc)
• Using r2pipe to talk to radare2
• Written in pure Rust
• Biggest part of it written during GSoC 2015
• Authors are: Sushant Dinesh and David Kreuter15
• GSoC 2015 was done under the Openwall’s project umbrella
15radeco-gsoc.16radeco.
35
why radeco?
• Existent FOSS decompilers are old and not use modern research
• Interesting methods to decompile rarely implemented in FOSS tools
• Radare2 as a RE suite need a good decompiler
• Challenging still viable task for Google Summer of Code
36
radeco il description
• Based purely on graphs
• Based on some concepts from RREIL and MAIL
• Simplification to SSA form while lifting ESIL -> Radeco IL
• Automatically performed DCE (Dead Code Elimination)
• Types inference17
17tie.
37
radeco demo
• DEMO
38
future improvements
supported architectures
• Currently supported: x86, ARM, GameBoy, 8051, etc
• Goal is to add support for all architectures in radare2
• CPU/SoC/chip profiles for a slight differences between them
40
instruction sets
• Floating point (LLVM/McSema)18
• SIMD instructions (SSE, AVX, Neon, etc)
• VLIW and parallel execution (for DSP architectures emulation)
18floating-point-SO.
41
visual debugging and tracing
• General UI improvements
• Simple representation of diffs between emulation and nativeexecution
• Auto removing dead ways/blocks from ASCII graphs
• Integration with WebUI and Bokken19
19bokken.
42
radeco improvements
• pseudo-C code emission
• Wider support for various: native and custom types
• Recalculating results on the fly, depending from debugging results
• Types inference and function/classes autorecognition2021
20tie.21il-optimization.
43
references
a lot of them I
45