security applications for emulation

Security Applications For Emulation

[email protected]

Speaker details

An independent researcher. Presented a number of vulnerabilities at the first Ruxcon after auditing the

opensource kernels (FreeBSD, NetBSD, Linux, OpenBSD) Also interested in Reverse Engineering, speaking at CanSecWest on

Linux malware.

Outline

A Presentation examining public research, and the results of my own research, on the topic of emulation applied to security.

Technology review Security applications for emulation

Reverse engineering Cisco IOS Heap Management Tracing and evaluating the capabilities of binaries Dynamic Taint Analysis Automated unpacking Symbolic Execution Detecting Runtime Errors in Programs

And introducing a new tool for the detecting out of bounds heap access in the Linux Kernel

Virtualization

Different technologies all sharing similar themes Virtualization Emulation Dynamic Binary Translation

Different types of virtualization Full Virtualization provides a simulation of the underlying hardware

Host performs native execution of the guest as much as possible. Not an emulator, so aiming for near native speeds. In i386, if there isn't full virtualization hardware support,

privileged code is translated Eg VMWare, VirtualBox

Virtualization is an important technology, but this presentation focuses on the host being able to intercept and emulate each individual instruction in the guest. This is in contrast to virtualization, which executes guest code natively as much as possible, with little general host interception.

Emulation and Dynamic Binary Translation

Emulation Emulator Fetches, Decodes and Executes instruction by instruction Different types of emulators: whole system emulators capable of

running unmodified guest operating systems, or emulators only capable of running applications on specific systems.

Guest state is maintained in software, including the CPU, system memory, and for whole system emulators, hardware devices.

Eg Bochs Used in the open source automated unpacker, Pandora's Bochs.

Dynamic Binary Translation A faster form of emulation Caches blocks of decoded and translated instruction Eg QEMU

Used in Argos, a system for capturing 0day* Used in my MemCheck tool for detecting Linux kernel heap

access bugs*.

Dynamic Analysis and Emulation

An emulator can be used to implement dynamic analysis. Dynamic Analysis means running a program and seeing whats going on as

it executes, eg as in a debugger It can mean identifying specific behaviors in the program, such as how

the program accesses memory, transfers execution control, or treats network data.

Dynamic analysis using a debugger is prone to anti-debugging tricks, and is very cumbersome when applied in a kernel context.

A robust solution is to perform dynamic analysis from inside an emulator. Hooks are added in the fetch/decode/execute loop of an emulator. When modifying a dynamic binary translator generally,

instrumentation or callbacks are added to the translated code blocks. All the applications for emulation presented, are related to or applications

of dynamic analysis.

Part i)

Reverse Engineering Cisco IOS's Heap Management

Reverse Engineering Cisco IOSwith Dynamips

Dynamips is an open source emulator and binary translator of Cisco hardware running PPC/MIPS IOS images.

Potential future development environment for IOS exploits. Dynamic analysis of IOS*

My experience is with IOS on MIPS IOS MIPS images use an invalid ELF e_machine field. Some IDA (5.2) bugs with MIPS (turn off macros to workaround).

Dynamic analysis, can identify heap management functions in IOS and provide a means to potentially implement Valgrind style heap checkers.

It can also be used to reverse engineer other components of IOS. Dynamic analysis is different to the static approach, and has some

advantages Can be completely automated Since the behavior of the IOS implementation is relatively constant

this method can work across different IOS images, providing new or obsolete features aren't being examined

IOS Heap Management Basics

Well documented public research in developing heap based buffer overflow exploits describes general heap layout.

IOS heap allocated buffers have a header appearing directly before the buffer, and a trailer that follows the buffer.

These 'chunks' form a doubly linked list. Chunk header begins with a known constant

This fact is used later in the analysis.

Dynamic Analysis Approach

Knowing the header constant of a malloc chunk enables us to track memory allocations by intercepting writes to memory of that particular constant.

Heap management is slightly different in a kernel but a kernel or user mode alloc/free still has a set of expected semantics and prototypes.

An alloc(ation) function returns a pointer to an allocated buffer. But don't expect there only to be one argument of the allocation size,

eg kmalloc in Linux has multiple arguments including flags. Free might have multiple arguments also, but one of those arguments

is certainly a pointer to an allocated buffer. By tracking allocations, and checking the behavior of functions, we can

infer the locations of malloc and free.

Identifying Functions with Dynamic Analysis

Finding malloc Track writes to memory that write the constant that identifies a malloc

chunk. Track procedures exits, checking the return value for a pointer to a

known allocated buffer. This return value is the chunk location + chunk header length.

First function to return allocated buffer is malloc, but sample a number of times to be sure.

Finding free Find two malloc calls that return the same memory Free must have occurred between mallocs since logically, allocated

buffers can't overlap. Track procedure calls with an argument matching freed memory, eg

free(ptr) Sample large enough set, common function among samples is free.

Testing the results with a double free and overlapping allocation checker. How can we determine if malloc and free are the only heap management

functions. The solution is to trace those functions while running IOS, building our

own representation of the heap, all the while checking for consistency in our representation.

Certain conditions should always be true in a well managed heap. If any assertions fail catastrophically, our model of the heap is incorrect.

Only allocated memory can be freed. Allocated memory can not overlap.

This results in a checker that can be used to detect double free bugs in IOS, as they happen, much like Valgrind. But IOS checks the consistency of the heap regularly and also during free, so the checker is probably only useful for automated analysis.

Detecting IOS 0-day

Another type of IOS checker could potentially be made to detect 0-day attacks.

IOS exploitation uses corrupted malloc chunks that are subsequently freed.

Freeing the corrupt chunk causes an arbitrary write to memory. The checker could confirm the consistency of header attributes such as the

size of each chunk through the interception of free calls. For more complete coverage, the chunk header could be retrieved and

stored after every malloc, subsequently being verified before free. In a roll-out, honeypots could automatically detect mass 0-day

exploitation and raise alarms of the attack.

Reference Counting.

Tracing malloc and free, shows us conditions where we are freeing the same memory twice, or performing a double free.

Potentially this could indicate a bug in IOS but there are simply too many alerts to be meaningful.

In fact, it turns out that as suspected by other researchers, allocated buffers are reference counted

Before the two double frees is a call to increment the reference count (IncRefCnt) of the buffer, thus causing the first free to simply decrement the count without actually freeing the memory.

MIPS has an atomic addition instruction, used only for incrementing the malloc chunk refcnt.

Any procedure that uses this instruction on a malloc chunk is IncRefCnt.

For other architectures, the refcnt field in the malloc chunk is at a fixed offset, and writes to this address may also indicate the location of IncRefCnt.

MallocLite

Tracing also reveals the appearance of overlapping memory allocations. In later versions of IOS, 'MallocLite' implementation is used. A 64k allocation is used which is subsequently subdivided for use in

allocations <= 128 bytes. This feature may affect the writing of heap exploits and should be taken

into account. If malloc recursively calls itself, requesting 64k of memory, then

MallocLite is allocating this larger block of memory. For tracing, ignoring recursive allocations works.

Cisco IOS TODO

The malloc tracer could potentially be used to implement a Valgrind style MemCheck tool to detect out of bounds heap access.

This could be used alongside fuzzing to provide more accurate detection of vulnerabilities when they happen.

Easy to implement, but the initial attempt resulted in too many false positives.

Problem: There are other functions that have direct access to internal heap structures besides malloc, free and IncRefCnt, eg CheckHeaps.

More reversing is required. If Cisco gave me access to the source, I'm pretty sure I could whack

this out in a week ;-) The MemCheck concept was later successfully implemented for the Linux

Kernel as source code is openly available.

Cisco IOS Summary

By modifying the open source Cisco emulator, dynamips, dynamic analysis of IOS is possible.

Dynamic Analysis of IOS can aid in reverse engineering. Potentially one day we will have Valgrind style IOS memory checking

tool, or in the near future a 0-day detection tool.

Part ii)

Tracing execution and evaluating the capabilities of binaries and potential malware

Tracing and evaluating the capabilities of binaries

Running binary inside a sandboxed environment logging events of interest.

System calls, registry changes, files accessed, process management, services started or stopped etc.

Public websites offer free online services to evaluate binaries and potential malware.

Trace useful for quickly determining what a binary is doing. May help in determining if binary is malicious.

A non emulated approach is to trace the binary using a debugger based tool from userspace within a VM.

Malware almost certain to use anti debugging tricks which may make tracing problematic.

Another approach is to perform the execution inside an emulator. Emulated approach very resistant to modern anti-debugging tricks.

TTAnalyze

TTAnalyze: A Masters thesis that presented a closed source fork of QEMU that logged windows system calls.

Important as other techniques such as automated unpacking are based on similar methods and the thesis clearly describes the implementation.

Windows XP running as a guest, emulated by a fork of QEMU in the host. Host uploads binary to guest using virtual network created by VM. Binary is executed in guest environment. Host monitors execution and logs events of interest.

TTAnalyze concepts

Host emulator intercepts every instruction. It identifies instructions that belong to the process being monitored.

How to know what code is part of the process we wish to monitor? CR3 register (the page directory base address) is unique for each

process. Kernel maintains a process list (EPROCESS) with these addresses. Given a specific process instruction, it may be executing either kernel

code or user code. For our target process, kernel code is when EIP > 0x80000000.

For the target process, it checks EIP, and if it points to a Windows API call it logs the event.

It also logs returning from Windows API calls. To know the addresses of each Windows API call, it uses the PEB

from the target process used to eventually retrieve a list of all loaded DLL's.

The library calls in each DLL is parsed, and their addresses noted.

TTAnalyze Implementation

A component that executes inside the guest system Kernel driver to parse kernel EPROCESS list, to obtain the page

directory address (CR3), and PEB of the target process. RPC mechanism to control guest operations from host

uploading executables to guest Controlling execution of the target process, which is initially started in

a suspended state to allow querying. Querying the pdb/CR3 and PEB kernel driver.

QEMU modifications Identifying the process of interest using the CR3 result from the guest

kernel driver. The PEB is used to established a list of addresses for each windows

API call in a DLL* Identifying entering and leaving windows API calls in the guest, based

on intercepting each instruction and checking EIP.

TTAnalyze Implementation Challenges

Arguments for system calls which reside in virtual memory might be paged out.

QEMU page fault handler detects condition then alters guest code to access target memory, paging it in.

Malware can use the Native API directly. Understanding this requires unofficial documentation of API. Trap native calls by checking each instruction for a OS trap (int 2e or

sysenter).

TTAnalyze Attacks

Malware might evade detection of Windows API calls which is dependant on exact EIP matching.

Vulnerable if malware doesn't jump to the very beginning of a function, eg Caller might implement callee prologue

Malware might detect guest changes. Communication channel between host and guest. Kernel driver component. See Pandora's Bochs (An automated unpacker) implementation with

no guest changes. Malware might detect system emulators

CPU Bugs (in errata) generally not implemented Model Specific Registers implementation different for different CPU

vendors.

Binary Tracing Summary

Existing software that traces binaries using a userland style debugger based tool in a VM, vulnerable to many anti-debugging tricks.

An emulator can present a solution to that problem.

Part iii)

Using emulation for dynamic taint analysis

Dynamic Taint Analysis

A technique used to analyze the the flow of data in a program. Has applications in identifying vulnerabilities as they happen, eg Argos. Has also been used to identify spyware, eg, BitBlaze. Is a general concept that can be used in a number of applications,

including symbolic execution. Traces the flow of data, instruction by instruction, from a source that

generates 'tainted' data, to sinks where the data is used. Variables, registers and memory are tagged as being tainted or clean. Destination operand in instruction becomes tainted when a source operand is

tainted. Sometimes its useful that data can become untainted by certain operations.

Dynamic Taint Analysis in Vulnerability Detection

Dynamic Taint Analysis has been applied for vulnerability detection such as SQL injection, or incorrect use of the Unix exec*() or system() calls which run executables.

Source of user input, that is untrusted data, taints the data. Flow of untrusted data followed by taint analysis. If untrusted data checked in a condition, then input validation deemed to

have occurred, so untaint data. At site of exec*(), system(), or even mysql_query, check that argument is

non tainted. If tainted, then untrusted data assumed to have reached privileged code

and vulnerability has occurred.

Argos: A tool for detecting 0day attacks Uses dynamic taint analysis to detect 0day attacks. An open source fork of QEMU. Detects exploits as they are happening and automatically generates

vulnerability signatures. Vision is of an automatic worm defense system.

Honeypots detect 0day attacks. Generates and delivers vulnerability signatures to intrusion prevention

systems Argos works by dynamic taint analysis of network data which is considered

untrusted. Taints data returned from QEMU emulated network driver.

Exploits detected when their is code redirection under attacker control. If EIP becomes tainted (under the control of the attacker) If EIP points to tainted data. Execve system calls checked for tainted arguments.

Dyanamic Taint Analysis Summary

Dynamic Taint Analysis is a technique used to track the flow of data. Important because it can be used as a general technique in more applied

topics. Has applications including vulnerability detection and is used in places

like symbolic Execution.

Part iv)

Automated Unpacking

Packers

A packer rewrites an executable, wrapping a new layer of code around the original program.

Essentially becomes an executable inside an executable. A packer is used to compress, obfuscate or encrypt the original executable

Today almost all malware is packed. Packers originally used for compression I remember packers (or crunchers) from the early 90's, and had 2

floppy disks full of them, for the Commodore 64! The resulting packed executable consists of a runtime unpacking layer and

a binary blob of the compressed or obfuscated original program. At runtime, the unpacking layer, decompresses the blob writing to

memory the original executable. It then transfers execution back to the original code.

Not all packers follow this behavior. Some packers convert the original executable to PCODE. At runtime the packed executable acts as a VM.

Unpacking

Unpacking is the process of extracting the original executable from a packed image.

The manual approach is to run the packed executable in a debugger, skipping the unpacking stub which writes to memory the original image, and breaking (in the debugger) when execution transfers to the now unpacked image.

A dump of memory, but rebuild the image so its a valid executable again. Requires fixing the Import Address Table. ImpRec can do this.

Debugger scripts can automate the process on specific unpackers by identifying instruction sequences that indicate which stage the unpacking stub is in.

Automated Unpacking

Unpacking can be automated. Run packed executable. Track all memory writes by executable. If execution transfers to a priorly written to memory location, then

unpacking deemed to have occurred. May be necessary to repeat as multiple layers may exist. Public automated unpackers available from Offensive Computing, and

also Pandora's Bochs.

Automated Unpacking Implementation Approaches

Multiple approaches in implementation Use hardware page protection in OS to track writes and execution. Eg

Offensive Computing. This results in high performance. If running inside a virtualized environment like VMWare, VM

might be detected. Offensive Computing recommend using a real goat machine.

Dynamic Instrumentation or complete emulation of packed program to track memory writes and execution.

Offensive Computing use instrumentation approach with Intel PIN framework.

Pandoras Bochs uses the Bochs emulator.

Automated Unpacking using an Emulator Emulation is a mature closed source technology used by AntiVirus

Original usage of emulation was to detect polymorphic virus, but now used for unpacking also.

Typical AntiVirus emulator emulates both the instruction set and parts of the operating system.

This is how I wrote my own automated unpacker and emulator. There are no software licensing problems since the emulator is only a

regular piece of software. Another approach is to use a whole system emulator such as Bochs or

QEMU running an installed OS. Non emulated approaches are more likely to be detected or be suspect to anti-

debugging tricks employed by malware.

Using an AV style Emulator as a CPU checker

While developing my AV style emulator, a need arose to verify the emulation.

I Implemented a program tracer to trace programs in parallel to emulation Tracer needed to automatically evade anti-debugging tricks

Instructions needed to be emulated that would indicate the program was being debugged. (eg, EFlags popf, rdtsc, or software int1 being confused with single stepping)

Library calls also (eg, Process32* which shows debuger in process list, and IsDebuggerPresent)

For each traced instruction, the emulator executes the same instruction. The CPU state from the tracer is verified against the state of the emulator,

and checked for consistency. Some instructions produced differences between emulation and tracing,

not due to a fault of the emulator or tracer. CPU Bugs. Some Instructions not following Intel specifications.

Not setting/clearing processor status flags

Automated Unpacking using an Emulator implementation

Changes to an emulator required involve modifying the software MMU to track memory writes, and checking each instruction to see if the EIP matches any addresses where memory writes have occurred.

Similar problems as TTAnalyze are present in determining what code is part of the target process.

The Renovo unpacker from the BitBlaze project follows the TTAnalyze approach in starting the executable in a suspended state, and then using a kernel driver in the guest to find the page directory base address of the process.

Pandora's Bochs uses an unmodified guest system and instead watches for changes in the CR3 register to identify the target process.

To determine the value of CR3 it takes into account that in kernel mode windows uses the fs register to reference a known structure leading to the EPROCESS list which like TTAnalyze, contains the page directory base address (CR3) of each process.

Attacks against Automated Unpackers and Emulators

Malware might make use of unimplemented emulation of the architecture, instruction set or operating system

For AV emulators, use of obscure libraries. For whole system emulators, detection of the emulator. Malware

might check existence of known CPU errata. Having malware require activation (eg, using the Internet), or only

occasionally activating.

Attacks (cont): Virtual Machine Packers

Packer translates executable into PCODE. At runtime, PCODE is decoded and executed in the style of a virtual

machine. PCODE can be polymorphic. This type of packer doesn't follow the 'write to memory then execute'

algorithm. Eg, TheMida, but fortunately these packers are not as common in current

malware. No automated method of unpacking against an unknown packer of this

type.

Automated Unpacking Summary

Automated unpacking works on a theory of intercepting execution on priorly written to memory addresses.

Multiple approaches to implementation; emulation has some advantages. Automated unpacking doesn't work on VM based unpackers.

Part v)

Using emulation to design and implement symbolic execution

Symbolic Execution

A technique used to analyze programs. For unknown input to a program, it maintain generalized information on

program state, systematically exploring program paths. Really a definition for mixed symbolic execution.

Execution occurs, by emulating instructions and using symbolic formula instead of concrete data for user defined input.

Example symbolic data can be network packet contents, program arguments, file contents etc

Symbolic formula contain information on all program states on that program path for arbitrary user input, that is, all the values the data can possibly hold as held true by the symbolic formula.

Bug finding is equivalent to solving the equations. Eg, Is this pointer being dereference ever equal to 0, given arbitrary

user input. And if so, what is the user input that generates that bug.

SMT Based Constraint Solvers

Symbolic equations are generated for instructions that have symbolic arguments.

Conditional instructions generate equations which are constraints (eg, x < 10)

Equations handled by Satisfiability over Modulo Theory (SMT) Solvers. Efficient SMT based solvers are a relatively new achievement in the past

decade. Annual SMT competition pits solvers against each other. Microsoft has their own solver which is free to use, but not open

source. A number of open source solvers available.

SMT Solver can be queried, given a set of equations and constraints, to see if certain queried constraints are true.

Can easily determine if symbolic pointer is null.. SMT solvers can also generate concrete solutions from symbolic

equations

Applications of Symbolic Execution

As a Bug checker Dawson Englers closed source C checker ExE which could detect

buffer overflows, null pointer dereferences and divisions by zero. The open source Catchconv – which doesn't explore program paths,

but checks assertions on a given set of input using symbolic execution to find signedness bugs.

Intelligent fuzzing Symbolic Execution can automatically enumerate the paths and data in

a program that fuzzing normally misses, aiming towards complete automated code coverage.

Eg, closed source Microsoft Sage research Tracing and evaluating the capabilities of binaries

The closed source Bitblaze projects implements BitScope which is in a similar vein to TTAnalyze except it symbolically explores the many program paths in potential malware to find its capabilities.

Symbolic Execution Implementation

Emulator runs program, instruction by instruction, generating symbolic equations for instructions when a source operand is symbolic, such as the symbolic equation ebx=eax + 10.

In an instruction, if a source operand is symbolic, destination becomes symbolic.

This is implemented using Dynamic Taint Analysis At conditional instructions, two possible equations, the condition being

true, or the condition being false. Symbolic Execution explores each path separately.

A symbolic constraint representing the conditions truth is given to each path, eg (x > 10 and x <= 10).

Feasibility, that is if an equation can be satisfied as true, of each path is determined by SMT solvers.

Symbolic Execution Challenges

Symbolic Execution may never terminate in the presence of loops, so loops must be simplified, typically through unrolling.

Symbolic Execution therefore is not complete. Path Explosion: Dealing with functions like strcmp with symbolic input,

has many possible paths; an exponential number of paths for the size of the string.

BitBlaze approach: Hard code 'function summaries' to deal with common library functions.

Dealing with symbolic pointers. Dynamic taint analysis has trouble determining the target memory that

becomes tainted if a pointer is symbolic. Requires SMT solver to determine concrete solutions of pointer.

SMT solver support used for target architecture may not be complete No public solvers support floating point.

Symbolic Execution Summary

Symbolic execution is a relatively new method to analyze programs. Applications include bug checkers, smart fuzzers, and binary evaluation. I believe symbolic execution has a big part in the future of automated

analysis.

Part vi)

Detecting Runtime Errors in Programs

Valgrind

Valgrind is a heavyweight dynamic binary instrumentation framework. Most well known for the MemCheck checker. Memcheck used as a bug checker for incorrect heap use or access. Also detects uninitialized variable use.

Translates machine code to IR, then allows instrumentation, with modules that implement runtime checkers.

Valgrind's Memcheck can detect out of bounds or invalid heap access and tracks what addresses can be accessed by maintaining a 'shadow memory' mirroring allocations on the heap.

For each address in shadow memory, also stores weather its initialized or not.

Then checks all guest memory references belong to the shadow memory using IR instrumentation.

Valgrind's MemCheck with uninitialized variables

Uninitialized variable checker implemented using dynamic taint analysis. Newly allocated memory and new stack frames considered tainted. Initializing data untaints it.

Alert when using tainted/uninitialized data. Naive implementation causes false positives.

Memcpy of padded structures or memcpy of structures with uninitialized members causes false positives.

Fixed by warning only when using uninitialized variables in system calls, conditions or being dereferenced as a pointer.

Detecting Runtime Heap Errors in the Linux Kernel

Tools that have similar designs or aims to detect some classes of heap errors in the Linux Kernel.

KEFence (Linux) / MemGuard (FreeBSD) Detects overflows (and underflows for KEFence, but not both at the

same time) of heap buffers. Allocates a guard page next to the allocated buffer that page faults on

any access. Only detects overflows, not arbitary invalid access.

KmemCheck (Linux) Used to Detect uninitialized variable bugs. Maintains a shadow memory indicating state of data being initialized

or not. Page faults on all heap access, then checks shadow memory against

access. UML + Valgrind

Doesn't seem active, and source unavailable :(

Linux Kernel MemCheck

My own runtime checker that detects out of bounds heap access in the Linux Kernel.

Not Valgrind's MemCheck – I named it poorly I know. Tested under Linux 2.6.26 using a Windows Vista Cygwin host. Implemented as a C++ fork of QEMU. Dumps kernel stack trace on guest access violation Only reports when a memory access violation occurs, much like Valgrind.

Not a static analysis tool. Host maintains 'shadow memory' of guest Linux Kernel heap that

identifies valid heap addresses. The shadow memory is created by intercepting the heap management

functions in the Linux kernel and building a representation of the guest heap.

MemCheck validates all memory access against this shadow memory (like Valgrind).

Except in heap management functions like kmalloc, kfree etc.

Linux Kernel Heap Management

Linux has had several memory allocators, the latest Linux kernels now using the “slub” allocator.

MemCheck only supports the latest “slub” allocator. There are also three internal allocators in Linux that use the heap.

The Page Allocator, using the buddy allocator internally, which only handles allocations of sizes being a predetermined multiple of the page size.

The page allocator can be called directly or indirectly from the slub allocator.

The Slub Allocator which handles allocations of varying sizes by dividing up a “slab” that originates from the page allocator.

The BootMem Allocator which uses a simpler algorithm than the other allocators during boot time only.

Linux Kernel Heap Tracing and Guest Linux Implementation

MemCheck must trace the kernel allocator functions to properly create its shadow memory.

However tracing an unmodified Linux guest presents problems. The Page Allocator does not always return the address of the allocated

page contents, but returns a structure of the page description instead. The Slub Allocator defines kmalloc as an inline function which can't

be intercepted using a compile time symbol address. Following internal logic can be difficult, such as kmalloc using the

page allocator internally. The solution is to use a modified guest Linux Kernel that uses

instrumentation of the allocators that MemCheck can easily intercept

MemCheck QEMU implementation

QEMU was modified to implement MemCheck. MemCheck is written in C++ running in a Windows host, so I ported

QEMU 0.9.1 to compile under g++. In hindsight, porting was not necessary and not worth the effort. I also backported some patches that cause 0.9.1 to fail in windows.

QEMU has an optimization of merging basic blocks in a translation block. I needed basic block granularity to correctly intercept the beginning of functions so this QEMU optimization was turned off.

A tracer was implemented to track functions using a callback interface on function entry or exit.

By tracing the heap management code, a simple shadow memory was constructed using C++ STL maps for the implementation.

The software MMU in QEMU was modified to check the memory access was a valid address in the shadow memory.

MemChecking the Linux Kernel

The Linux Test Project (LTP) contains 3000+ tests for the Linux Kernel which exercise much of the core kernel code.

Ran the default test suite on Linux 2.6.26.3 using MemCheck. MemCheck is slow, but still allows for interactive sessions. Fedora Linux takes 30+ minutes to boot. Let the testsuite to run overnight

No out of bounds access detected. Reran the testsuite again using slub debugging which in combination to

MemCheck, may result in more bugs being detected. Again, no out of bounds access detected.

While no immediate bugs were identified in 2.6.26.3, MemCheck may be used against future kernel releases, possibly as part of an automated test suite, or used to aid kernel debugging and development.

MemCheck Limitations

Because MemCheck is based on QEMU, very little hardware is emulated so most of the Linux driver code is not tested.

Buffer overflows don't necessarily result in memory access using invalid heap addresses.

A slab based allocator fits heap allocations next to each other, so buffers overflow into adjacent and valid heap allocations.

A solution is to boot Linux using the slub_debug kernel option which separates heap objects using a redzone.

If MemCheck generates a report from a vulnerable kernel module, only kernel addresses are given in the stack trace no symbolic names are used.

MemCheck TODO

A solution to the adjacent buffer problem is to associate every heap access with its original allocation by tracking heap pointers using dynamic taint analysis.

This use of dynamic taint analysis could also be applied in userland, as a Valgrind checker.

Dynamic taint analysis can also be the basis of tracking uninitialized variable usage without the false positives currently associated with kmemcheck.

Dynamic taint analysis could also be used to implement garbage collection, which could be used to identify memory leaks at the exact location of each leak.

Symbol names for addresses in kernel modules!

MemCheck Packages

http://silvio.cesare.googlepages.com/ For the package http://silviocesare.wordpress.com/ For commentary on some of

MemCheck's internals.

Runtime Error Detection Summary

Existing tools for runtime error detection include Valgrind which detects userland heap bugs.

Tools for the kernel exist such as kmemcheck which detects uninitialized variables.

MemCheck is a new tool to detect heap bugs in the Linux Kernel, and operates similar to Valgrind.

That’s all folks…

A 2008 CQU Graduate looking for interesting employment.

[email protected]