uninformed v8a2

8/6/2019 Uninformed v8a2

1/28

A Catalog of

Windows Local Kernel-mode Backdoor Techniques

August, 2007

skape [email protected] [email protected]


2/28

Contents

1 Introduction 1

2 Techniques 2

2.1 Image Patches . . . . . . . . . . . . . 3

2.1.1 Function Prologue Hooking . . 3

2.1.2 Disabling SeAccessCheck . . . 4

2.2 Descriptor Tables . . . . . . . . . . . . 5

2.2.1 IDT . . . . . . . . . . . . . . . 5

2.2.2 GDT / LDT . . . . . . . . . . 6

2.2.3 SSDT . . . . . . . . . . . . . . 7

2.3 Model-specific Registers . . . . . . . . 8

2.3.1 IA32 SYSENTER EIP . . . . . 8

2.4 Page Table Entries . . . . . . . . . . . 9

2.5 Function Pointers . . . . . . . . . . . . 10

2.5.1 Import Address Table . . . . . 12

2.5.2 KiDebugRoutine . . . . . . . . 13

2.5.3 KTHREADs SuspendApc . . . 15

2.5.4 Create Thread Notify Routine 16

2.5.5 Object Type Initializers . . . . 172.5.6 PsInvertedFunctionTable . . . 17

2.5.7 Delayed Procedures . . . . . . 20

2.6 Asynchronous Read Loop . . . . . . . 20

2.7 Leaking CS . . . . . . . . . . . . . . . 21

3 Prevention & Mitigation 22

4 Running Code in Kernel-Mode 23

5 PatchGuard versus Rootkits 24

6 Acknowledgements 25

7 Conclusion 25

Abstract

This paper presents a detailed catalog of techniquesthat can be used to create local kernel-mode back-doors on Windows. These techniques include func-tion trampolines, descriptor table hooks, model-specific register hooks, page table modifications, aswell as others that have not previously been de-scribed. The majority of these techniques have beenpublicly known far in advance of this paper. How-ever, at the time of this writing, there appears tobe no detailed single point of reference for many ofthem. The intention of this paper is to provide a solidunderstanding on the subject of local kernel-mode

backdoors. This understanding is necessary in orderto encourage the thoughtful discussion of potentialcountermeasures and perceived advancements. In thevein of countermeasures, some additional thoughtsare given to the common misconception that Patch-Guard, in its current design, can be used to preventkernel-mode rootkits.

1 Introduction

The classic separation of privileges between user-mode and kernel-mode has been a common featureincluded in most modern operating systems. Thisseparation allows operating systems to make securityguarantees relating to process isolation, kernel-userisolation, kernel-mode integrity, and so on. These se-curity guarantees are needed in order to prevent alesser privileged user-mode process from being ableto take control of the system itself. A kernel-modebackdoor is one method of bypassing these securityrestrictions.

There are many different techniques that can be used

to backdoor the kernel. For the purpose of this doc-ument, a backdoor will be considered to be some-thing that provides access to resources that wouldotherwise normally be restricted by the kernel. Theseresources might include executing code with kernel-mode privileges, accessing kernel-mode data, dis-

1


3/28

abling security checks, and so on. To help furtherlimit the scope of this document, the authors will fo-

cus strictly on techniques that can be used to providelocal backdoors into the kernel on Windows. In thiscontext, a local backdoor is a backdoor that does notrely on or make use of a network connection to pro-vide access to resources. Instead, local backdoors canbe viewed as ways of weakening the kernel in an effortto provide access to resources from non-privileged en-tities, such as user-mode processes.

The majority of the backdoor techniques discussedin this paper have been written about at length andin great detail in many different publications[20, 8,12, 18, 19, 21, 25, 26]. The primary goal of this pa-

per is to act as a point of reference for some of thecommon, as well as some of the not-so-common, lo-cal kernel-mode backdoor techniques. The authorshave attempted to include objective measurementsfor each technique along with a description of howeach technique works. As a part of defining these ob-

jective measurements, the authors have attempted toresearch the origins of some of the more well-knownbackdoor techniques. Since many of these techniqueshave been used for such a long time, the origins haveproven somewhat challenging to uncover.

The structure of this paper is as follows. In 2, each of

the individual techniques that can be used to providea local kernel-mode backdoor are discussed in detail.3 provides a brief discussion into general strategiesthat might be employed to prevent some of the tech-niques that are discussed. 4 attempts to refute someof the common arguments against preventing kernel-mode backdoors in and of themselves. Finally, 5 at-tempts to clarify why Microsofts PatchGuard shouldnot be considered a security solution with respect tokernel-mode backdoors.

2 Techniques

To help properly catalog the techniques described inthis section, the authors have attempted to includeobjective measurements of each technique. These

measurements are broken down as follows:

Category

The authors have chosen to adopt JoannaRutkowskas malware categorization in the inter-est of pursuing a standardized classification[34].This model describes three types of malware.Type 0 malware categorizes non-intrusive mal-ware; Type I includes malware that modifiesthings that should otherwise never be modified(code segments, MSRs, etc); Type II includesmalware that modifies things that should bemodified (global variables, other data); Type IIIis not within the scope of this document[33, 34].

In addition to the four malware types describedby Rutkowska, the authors propose Type IIawhich would categorize writable memory thatshould effectively be considered write-once in agiven context. For example, when a global DPCis initialized, the DpcRoutine can be consideredwrite-once. The authors consider this to be aderivative of Type II due to the fact that thememory remains writable and is less likely to bechecked than that of Type I.

Origin

If possible, the first known instance of the tech-niques use or some additional background on itsorigin is given.

Capabilities

The capabilities the backdoor offers. This can beone or more of the following: kernel-mode codeexecution, access to kernel-mode data, access torestricted resources. If a technique allows kernel-mode code execution, then it implicitly has allother capabilities listed.

Considerations

Any restrictions or special points that must bemade about the use of a given technique.

Covertness

A description of how easily the use of a giventechnique might be detected.

2


4/28

Since many of the techniques described in this docu-ment have been known for quite some time, the au-

thors have taken a best effort approach to identifyingsources of the original ideas. In many cases, this hasproved to be difficult or impossible. For this reason,the authors request that any inaccuracy in citationbe reported so that it may be corrected in future re-leases of this paper.

2.1 Image Patches

Perhaps the most obvious approach that can be usedto backdoor the kernel involves the modification of

code segments used by the kernel itself. This couldinclude modifying the code segments of kernel-modeimages like ntoskrnl.exe, ndis.sys, ntfs.sys, andso on. By making modifications to these code seg-ments, it is possible to hijack kernel-mode executionwhenever a hooked function is invoked. The possibil-ities surrounding the modification of code segmentsare limited only by what the kernel itself is capableof doing.

2.1.1 Function Prologue Hooking

Function hooking is the process of intercepting callsto a given function by redirecting those calls to analternative function. The concept of function hook-ing has been around for quite some time and its un-clear who originally presented the idea. There area number of different libraries and papers that existwhich help to facilitate the hooking of functions[21].With respect to local kernel-mode backdoors, func-tion hooking is an easy and reliable method of cre-ating a backdoor. There are a few different waysin which functions can be hooked. One of the mostcommon techniques involves overwriting the prologue

of the function to be hooked with an architecture-specific jump instruction that transfers control toan alternative function somewhere else in memory.This is the approach taken by Microsofts Detourslibrary[21]. While prologue hooks are conceptuallysimple, there is actually quite a bit of code needed to

implement them properly.

In order to implement a prologue hook in a portableand reliable manner, it is often necessary to make useof a disassembler that is able to determine the size,in bytes, of individual instructions. The reason forthis is that in order to perform the prologue over-write, the first few bytes of the function to be hookedmust be overwritten by a control transfer instruction(typically a jump). On the Intel architecture, controltransfer instructions can have one of three operands:a register, a relative offset, or a memory operand.Each operand type controls the size of the jump in-struction that will be needed: 2 bytes, 5 bytes, and 6bytes, respectively. The disassembler makes it possi-

ble to copy the first n instructions from the functionsprologue prior to performing the overwrite. The valueof n is determined by disassembling each instructionin the prologue until the number of bytes disassem-bled is greater than or equal to the number of bytesthat will be overwritten when hooking the function.

The reason the first n instructions must be saved intheir entirety is to make it possible for the originalfunction to be called by the hook function. In order tocall the original version of the function, a small stubof code must be generated that will execute the firstn instructions of the functions original prologue fol-

lowed by a jump to instruction n + 1 in the originalfunctions body. This stub of code has the effect of al-lowing the original function to be called without it be-ing diverted by the prologue overwrite. This methodof implementing function prologue hooks is used ex-tensively by Detours and other hooking libraries[21].

Recent versions of Windows, such as XP SP2 andVista, include image files that come with a moreelegant way of hooking a function with a functionprologue overwrite. In fact, these images have beenbuilt with a compiler enhancement that was designedspecifically to improve Microsofts ability to hook its

own functions during runtime. The enhancement in-volves creating functions with a two byte no-op in-struction, such as a mov edi, edi, as the first in-struction of a functions prologue. In addition tohaving this two byte instruction, the compiler alsoprefixes 5 no-op instructions to the function itself.

3


5/28

The two byte no-op instruction provides the neces-sary storage for a two byte relative short jump in-

struction to be placed on top of it. The relative shortjump, in turn, can then transfer control into anotherrelative jump instruction that has been placed in the5 bytes that were prefixed to the function itself. Theend result is a more deterministic way of hooking afunction using a prologue overwrite that does not relyon a disassembler. A common question is why a twobyte no-op instruction was used rather than two in-dividual no-op instructions. The answer for this hastwo parts. First, a two byte no-op instruction canbe overwritten in an atomic fashion whereas otherprologue overwrites, such as a 5 byte or 6 byte over-write, cannot. The second part has to do with thefact that having a two byte no-op instruction preventsrace conditions associated with any thread executingcode from within the set of bytes that are overwrit-ten when the hook is installed. This race condition iscommon to any type of function prologue overwrite.

To better understand this race condition, considerwhat might happen if the prologue of a function hadtwo single byte no-op instructions. Prior to this func-tion being hooked, a thread executes the first no-opinstruction. In between the execution of this firstno-op and the second no-op, the function in questionis hooked in the context of a second thread and thefirst two bytes are overwritten with the opcodes asso-ciated with a relative short jump instruction, such as0xeb and 0xf9. After the prologue overwrite occurs,the first thread begins executing what was originallythe second no-op instruction. However, now that thefunction has been hooked, the no-op instruction mayhave been changed from 0x90 to 0xf9. This mayhave disastrous effects depending on the context thatthe hook is executed in. While this race conditionmay seem unlikely, it is nevertheless feasible and cantherefore directly impact the reliability of any solu-tion that uses prologue overwrites in order to hook

functions.Category: Type I

Origin: The concept of patching code has existedsince the dawn of digital computing[21].

Capabilities: Kernel-mode code execution

Considerations: The reliability of a function pro-logue hook is directly related to the reliability of thedisassembler used and the number of bytes that areoverwritten in a function prologue. If the two byteno-op instruction is not present, then it is unlikelythat a function prologue overwrite will be able tobe multiprocessor safe. Likewise, if a disassemblerdoes not accurately count the size of instructions inrelation to the actual processor, then the functionprologue hook may fail, leading to an unexpectedcrash of the system. One other point that is worthmentioning is that authors of hook functions mustbe careful not to inadvertently introduce instability

issues into the operating system by failing to prop-erly sanitize and check parameters to the functionthat is hooked. There have been many exampleswhere legitimate software has gone the route of hook-ing functions without taking these considerations intoaccount[38].

Covertness: At the time of this writing, the useof function prologue overwrites is considered to notbe covert. It is trivial for tools, such as JoannaRutkowskas System Virginity Verifier[32], to com-pare the in-memory version of system images withthe on-disk versions in an effort to detect in-memory

alterations. The Windows Debugger (windbg) willalso make an analyst aware of differences betweenin-memory code segments and their on-disk counter-parts.

2.1.2 Disabling SeAccessCheck

In Phrack 55, Greg Hoglund described the bene-fits of patching nt!SeAccessCheck so that it neverreturns access denied[19]. This has the effect ofcausing access checks on securable objects to always

grant access, regardless of whether or not the ac-cess would normally be granted. As a result, non-privileged users can directly access otherwise privi-leged resources. This simple modification does notdirectly make it possible to execute privileged code,but it does indirectly facilitate it by allowing non-

4


6/28

privileged users to interact with and modify systemprocesses.

Category: Type I

Origin: Greg Hoglund was the first person to pub-licly identify this technique in September, 1999[19].

Capabilities: Access to restricted resources.

Covertness: Like function prologue overwrites,the nt!SeAccessCheck patch can be detectedthrough differences between the mapped image ofntoskrnl.exe and the on-disk version.

2.2 Descriptor Tables

The x86 architecture has a number of different de-scriptor tables that are used by the processor to han-dle things like memory management (GDT), inter-rupt dispatching (IDT), and so on. In addition toprocessor-level descriptor tables, the Windows oper-ating system itself also includes a number of distinctsoftware-level descriptor tables, such as the SSDT.The majority of these descriptor tables are heavily re-lied upon by the operating system and therefore rep-resent a tantalizing target for use in backdoors. Like

the function hooking technique described in 2.1.1, allof the techniques presented in this subsection havebeen known about for a significant amount of time.The authors have attempted, when possible, to iden-tify the origins of each technique.

2.2.1 IDT

The Interrupt Descriptor Table (IDT) is a processor-relative structure that is used when dispatching in-terrupts. Interrupts are used by the processor as ameans of interrupting program execution in order to

handle an event. Interrupts can occur as a result of asignal from hardware or as a result of software assert-ing an interrupt through the int instruction[23]. TheIDT contains 256 descriptors that are associated withthe 256 interrupt vectors supported by the processor.Each IDT descriptor can be one of three types of gate

descriptors (task, interrupt, trap) which are used todescribe where and how control should be transferred

when an interrupt for a particular vector occurs. Thebase address and limit of the IDT are stored in theidtr register which is populated through the lidtinstruction. The current base address and limit ofthe idtr can be read using the sidt instruction.

The concept of an IDT hook has most likely beenaround since the origin of the concept of interrupthandling. In most cases, an IDT hook works by redi-recting the procedure entry point for a given IDTdescriptor to an alternative location. Conceptually,this is the same process involved in hooking any func-tion pointer (which is described in more detail in 2.5).

The difference comes as a result of the specific codenecessary to hook an IDT descriptor.

On the x86 processor, each IDT descriptor is an eightbyte data structure. IDT descriptors that are eitheran interrupt gate or trap gate descriptor contain theprocedure entry point and code segment selector tobe used when the descriptors associated interruptvector is asserted. In addition to containing controltransfer information, each IDT descriptor also con-tains additional flags that further control what ac-tions are taken. The Windows kernel describes IDTdescriptors using the following structure:

kd> dt _KIDTENTRY

+0x000 Offset : Uint2B

+0x002 Selector : Uint2B

+0x004 Access : Uint2B

+0x006 ExtendedOffset : Uint2B

In the above data structure, the Offset field holdsthe low 16 bits of the procedure entry point and theExtendedOffset field holds the high 16 bits. Usingthis knowledge, an IDT descriptor could be hooked byredirecting the procedure entry point to an alternatefunction. The following code illustrates how this can

be accomplished:

typedef struct _IDT

{

USHORT Limit;

PIDT_DESCRIPTOR Descriptors;

5


7/28

} IDT, *PIDT;

static NTSTATUS HookIdtEntry(IN UCHAR DescriptorIndex,

IN ULONG_PTR NewHandler,

OUT PULONG_PTR OriginalHandler OPTIONAL)

{

PIDT_DESCRIPTOR Descriptor = NULL;

IDT Idt;

__asm sidt [Idt]

Descriptor = &Idt.Descriptors[DescriptorIndex];

*OriginalHandler =

(ULONG_PTR)(Descriptor->OffsetLow +

(Descriptor->OffsetHigh OffsetLow =(USHORT)(NewHandler & 0xffff);

Descriptor->OffsetHigh =

(USHORT)((NewHandler >> 16) & 0xffff);

__asm lidt [Idt]

return STATUS_SUCCESS;

}

In addition to hooking an individual IDT descriptor,the entire IDT can be hooked by creating a new ta-ble and then setting its information using the lidtinstruction.

Category: Type I; although some portions of theIDT may be legitimately hooked.

Origin: The IDT hook has its origins in InterruptVector Table (IVT) hooks. In October, 1999, PrasadDabak et al wrote about IVT hooks[31]. Sadly, theyalso seemingly failed to cite their sources. Its cer-tain that IVT hooks have existed prior to 1999. Theoldest virus citation the authors could find was from1994, but DOS was released in 1981 and it is likelythe first IVT hooks were seen shortly thereafter[7]. Apatent that was filed in December, 1985 entitled Dual

operating system computer talks about IVT reloca-tion in a manner that suggests IVT hooking of someform[3].

Capabilities: Kernel-mode code execution.

Covertness: Detection of IDT hooks is often triv-

ial and is a common practice for rootkit detectiontools[32].

2.2.2 GDT / LDT

The Global Descriptor Table (GDT) and Local De-scriptor Table (LDT) are used to store segment de-scriptors that describe a view of a systems addressspace1. Segment descriptors include the base address,limit, privilege information, and other flags that areused by the processor when translating a logical ad-dress (seg:offset) to a linear address. Segment se-lectors are integers that are used to indirectly ref-

erence individual segment descriptors based on theiroffset into a given descriptor table. Software makesuse of segment selectors through segment registers,such as CS, DS, ES, and so on. More detail about thebehavior on segmentation can be found in the x86and x64 system programming manuals[1].

In Phrack 55, Greg Hoglund described the poten-tial for abusing conforming code segments[19]. Aconforming code segment, as opposed to a non-conforming code segment, permits control transferswhere CPL is numerically greater than DPL. How-ever, the CPL is not altered as a result of this typeof control transfer. As such, effective privileges ofthe caller are not changed. For this reason, its un-clear how this could be used to access kernel-modememory due to the fact that page protections wouldstill prevent lesser privileged callers from accessingkernel-mode pages when paging is enabled.

Derek Soeder identified an awesome flaw in 2003 thatallowed a user-mode process to create an expand-down segment descriptor in the calling processLDT[40]. An expand-down segment descriptor in-verts the meaning of the limit and base address as-sociated with a segment descriptor. In this way, thelimit describes the lower limit and the base addressdescribes the upper limit. The reason this is useful isdue to the fact that when kernel-mode routines val-idate addresses passed in from user-mode, they as-sume flat segments that start at base address zero.

1Each processor has its own GDT

6


8/28

This is the same thing as assuming that a logical ad-dress is equivalent to a linear address. However, when

expand-down segment descriptors are used, the linearaddress will reference a memory location that can bein stark contrast to the address thats being validatedby kernel-mode. In order to exploit this condition toescalate privileges, all thats necessary is to identify asystem service in kernel-mode that will run with es-calated privileges and make use of segment selectorsprovided by user-mode without properly validatingthem. Derek gives an example of a MOVS instructionin the int 0x2e handler. This trick can be abused inthe context of a local kernel-mode backdoor to pro-vide a way for user-mode code to be able to read andwrite kernel-mode memory.

In addition to abusing specific flaws in the way mem-ory can be referenced through the GDT and LDT, itsalso possible to define custom gate descriptors thatwould make it possible to call code in kernel-modefrom user-mode[23]. One particularly useful type ofgate descriptor, at least in the context of a backdoor,is a call gate descriptor. The purpose of a call gate isto allow lesser privileged code to call more privilegedcode in a secure fashion[45]. To abuse this, a back-door can simply define its own call gate descriptorand then make use of it to run code in the context ofthe kernel.

Category: Type IIa; with the exception of the LDT.The LDT may be better classified as Type II consid-ering it exposes an API to user-mode that allows thecreation of custom LDT entries (NtSetLdtEntries).

Origin: Its unclear if there were some situationalrequirements that would be needed in order to abusethe issue described by Greg Hoglund. The flaw iden-tified by Derek Soeder in 2003 was an example of arecurrence of an issue that was found in older ver-sions of other operating systems, such as Linux. Forexample, a mailing list post made by Morten Welin-

der to LKML in 1996 describes a fix for what appearsto be the same type of issue that was identified byDerek[44]. Creating a custom gate descriptor for usein the context of a backdoor has been used in thepast. Greg Hoglund described the use of call gates inthe context of a rootkit in 1999[19]

Capabilities: In the case of the expand-down seg-ment descriptor, access to kernel-mode data is pos-

sible. This can also indirectly lead to kernel-modecode execution, but it would rely on another back-door technique. If a gate descriptor is abused, directkernel-mode code execution is possible.

Covertness: It is entirely possible to write have codethat will detect the addition or alteration of entriesin the GDT or each individual process LDT. For ex-ample, PatchGuard will currently detect alterationsto the GDT.

2.2.3 SSDT

The System Service Descriptor Table (SSDT) is usedby the Windows kernel when dispatching systemcalls. The SSDT itself is exported in kernel-modethrough the nt!KeServiceDescriptorTable globalvariable. This variable contains information relat-ing to system call tables that have been registeredwith the operating. In contrast to other operatingsystems, the Windows kernel supports the dynamicregistration (nt!KeAddSystemServiceTable) of newsystem call tables at runtime. The two most commonsystem call tables are those used for native and GDIsystem calls.

In the context of a local kernel-mode backdoor, sys-tem calls represent an obvious target due to the factthat they are implicitly tied to the privilege boundarythat exists between user-mode and kernel-mode. Theact of hooking a system call handler in kernel-modemakes it possible to expose a privileged backdoor intothe kernel using the operating systems well-definedsystem call interface. Furthermore, hooking systemcalls makes it possible for the backdoor to alter datathat is seen by user-mode and thus potentially hideits presence to some degree.

In practice, system calls can be hooked on Win-dows using two distinct strategies. The first strategyinvolves using generic function hooking techniqueswhich are described in 2.1.1. The second strategy in-volves using the function pointer hooking techniquewhich is described in 2.5. Using the function pointer

7


9/28

hooking involves simply altering the function pointerassociated with a specific system call index by ac-

cessed the system call table which contains the sys-tem call that is to be hooked.

The following code shows a very simple illustrationof how one might go about hooking a system callin the native system call table on 32-bit versions ofWindows2:

PVOID HookSystemCall(

PVOID SystemCallFunction,

PVOID HookFunction)

{

ULONG SystemCallIndex =

*(ULONG *)((PCHAR)SystemCallFunction+1 );

PVOID *NativeSystemCallTable =

KeServiceDescriptorTable[0];

PVOID OriginalSystemCall =

NativeSystemCallTable[SystemCallIndex];

NativeSystemCallTable[SystemCallIndex] = HookFunction;

return OriginalSystemCall;

}

Category: Type I if prologue hook is used. TypeIIa if the function pointer hook is used. The SSDT(both native and GDI) should effectively be consid-ered write-once.

Origin: System call hooking has been used exten-sively for quite some time. Since this technique hasbecome so well-known, its actual origins are unclear.The earliest description the authors could find wasfrom M. B. Jones in a paper from 1993 entitled Inter-position agents: Transparently interposing user codeat the system interface[27]. Jones explains in his sec-tion on related work that he was unable to find anyexplicit research on the subject prior of agent-basedinterposition prior to his writing. However, it seemsclear that system calls were being hooked in an ad-hoc fashion far in advance of this point. The authorswere unable to find many of the papers cited by Jones.Plaguez appears to be one of the first (Jan, 1998) topublicly illustrate the usefulness of system call hook-

2System call hooking on 64-bit versions of Windows would

require PatchGuard to be disabled

ing in Linux with a specific eye toward security inPhrack 52[30].


Considerations: On certain versions of WindowsXP, the SSDT is marked as read-only. This must betaken into account when attempting to write to theSSDT across multiple versions of Windows.

Covertness: System call hooks on Windows are veryeasy to detect. Comparing the in-memory SSDTswith the on-disk versions is one of the most commonstrategies employed.

2.3 Model-specific Registers

Intel processors support a special category ofprocessor-specific registers known as Model-specificRegisters (MSRs). MSRs provide software with theability to control various hardware and software fea-tures. Unlike other registers, MSRs are tied to a spe-cific processor model and are not guaranteed to besupported in future versions of a processor line. Someof the features that MSRs offer include enhanced per-formance monitoring and debugging, among otherthings. Software can read MSRs using the rdmsr in-

struction and write MSRs using the wrmsr[23].

This subsection will describe some of the MSRs thatmay be useful in the context of a local kernel-modebackdoor.

2.3.1 IA32 SYSENTER EIP

The Pentium II introduced enhanced support fortransitioning between user-mode and kernel-mode.This support was provided through the introductionof two new instructions: sysenter and sysexit3.When a user-mode application wishes to transitionto kernel-mode, it issues the sysenter instruction.When the kernel is ready to return to user-mode,

3AMD processors also introduced enhanced new instruc-

tions to provide this feature

8


10/28

it issues the sysexit instruction. Unlike the thecall instruction, the sysenter instruction takes no

operands. Instead, this instruction uses three specificMSRs that are initialized by the operating system asthe target for control transfers[23].

The IA32 SYSENTER CS (0x174) MSR is used bythe processor to set the kernel-mode CS. TheIA32 SYSENTER EIP (0x176) MSR contains the vir-tual address of the kernel-mode entry point thatcode should begin executing at once the transitionhas completed. The third MSR, IA32 SYSENTER ESP(0x175), contains the virtual address that the stackpointer should be set to. Of these three MSRs,IA32 SYSENTER EIP is the most interesting in terms

of its potential for use in the context of a backdoor.Setting this MSR to the address of a function con-trolled by the backdoor makes it possible for thebackdoor to intercept all system calls after they havetrapped into kernel-mode. This provides a very pow-erful vantage point.

For more information on the behavior of thesysenter and sysexit instructions, the readershould consult both the Intel manuals and John Gul-brandsens article[23, 15].

Category: Type I

Origin: This feature is provided for the explicit pur-pose of allowing an operating system to control thebehavior of the sysenter instruction. As such, itis only logical that it can also be applied in thecontext of a backdoor. Kimmo Kasslin mentions avirus from December, 2005 that made use of MSRhooks[25]. Earlier that year in February, fuzen opfrom rootkit.com released a proof of concept[12].

Capabilities: Kernel-mode code execution

Considerations: This technique is restricted by thefact that not all processors support this MSR. Fur-

thermore, user-mode processes are not necessarily re-quired to use it in order to transition into kernel-modewhen performing a system call. These facts limit theeffectiveness of this technique as it is not guaranteedto work on all machines.

Covertness: Changing the value of theIA32 SYSENTER EIP MSR can be detected. For

example, PatchGuard currently checks to see if theequivalent AMD64 MSR has been modified as apart of its polling checks[36]. It is more difficult forthird party vendors to perform this check due to thesimple fact that the default value for this MSR is anunexported symbol named nt!KiFastCallEntry:

kd> rdmsr 176

msr[176] = 00000000804de6f0

kd> u 00000000804de6f0

nt!KiFastCallEntry:

804de6f0 b923000000 mov ecx,23h

Without having symbols, third parties have a moredifficult time of distinguishing between a value thatis sane and one that is not.

2.4 Page Table Entries

When operating in protected mode, x86 processorssupport virtualizing the address space through theuse of a feature known as paging. The paging featuremakes it possible to virtualize the address space byadding a translation layer between linear addresses

and physical addresses4

. To translate addresses, theprocessor uses portions of the address being refer-enced to index directories and tables that convey flagsand physical address information that describe howthe translation should be performed. The majority ofthe details on how this translation is performed areoutside of the scope of this document. If necessary,the reader should consult section 3.7 of the Intel Sys-tem Programming Manual[23]. Many other papers inthe references also discuss this topic[41].

The paging system is particularly interesting due toits potential for abuse in the context of a backdoor.

When the processor attempts to translate a linearaddress, it walks a number of page tables to deter-mine the associated physical address. When this oc-curs, the processor makes a check to ensure that the

4When paging is not enabled, linear addresses are equiva-

lent to physical addresses

9


11/28

task referencing the address has sufficient rights todo so. This access check is enforced by checking

the User/Supervisor bit of the Page-Directory En-try (PDE) and Page-Table Entry (PTE) associatedwith the page. If this bit is clear, only the supervisor(privilege level 0) is allowed to access the page. Ifthe bit is set, both supervisor and user are allowed toaccess the page5.

The implications surrounding this flag should be ob-vious. By toggling the flag in the PDE and PTE asso-ciated with an address, a backdoor can gain accessto read or write kernel-mode memory. This wouldindirectly make it possible to gain code execution bymaking use of one of the other techniques listed in

this document.

Category: Type II

Origin: The modification of PDE and PTE entrieshas been supported since the hardware pagings in-ception. The authors were not able to find an exactsource of the first use of this technique in a backdoor.There have been a number of examples in recent yearsof tools that abuse the supervisor bit in one way oranother[29, 41]. PaX team provided the first docu-mentation of their PAGEEXEC code in March, 2003.In January, 1998, Mythrandir mentions the supervi-

sor bit in phrack 52 but doesnt explicitly call outhow it could be abused[28].

Capabilities: Access to kernel-mode data.

Considerations: Code that attempts to implementthis approach would need to properly support PAEand non-PAE processors on x86 in order to work reli-ably. This approach is also extremely dangerous andpotentially unreliable depending on how it interactswith the memory manager. For example, if pages arenot properly locked into physical memory, they maybe pruned and thus any PDE or PTE modificationswould be lost. This would result in the user-mode

process losing access to a specific page.

Covertness: This approach could be considered

5This isnt always the case depending on whether or not

the WP bit is set in CR0

fairly covert without the presence of some tool capa-ble of intercepting PDE or PTE modifications. Locking

pages into physical memory may make it easier to de-tect in a polling fashion by walking the set of lockedpages and checking to see if their associated PDE orPTE has been made accessible to user-mode.

2.5 Function Pointers

The use of function pointers to indirectly transfercontrol of execution from one location to another isused extensively by the Windows kernel[18]. Like thefunction prologue overwrite described in 2.1.1, the act

of hooking a function by altering a function pointer isan easy way to intercept future calls to a given func-tion. The difference, however, is that hooking a func-tion by altering a function pointer will only interceptindirect calls made to the hooked function throughthe function pointer. Though this may seem like afairly significant limitation, even these restrictions donot drastically limit the set of function pointers thatcan be abused to provide a kernel-mode backdoor.

The concept itself should be simple enough. Allthats necessary is to modify the contents of a givenfunction pointer to point at untrusted code. When

the function is invoked through the function pointer,the untrusted code is executed instead. If the un-trusted code wishes to be able to call the functionthat is being hooked, it can save the address thatis stored in the function pointer prior to overwrit-ing it. When possible, hooking a function through afunction pointer is a simple and elegant solution thatshould have very little impact on the stability of thesystem (with obvious exception to the quality of thereplacement function).

Regardless of what approach is taken to hook afunction, an obvious question is where the backdoor

code associated with a given hook function shouldbe placed. There are really only two general mem-ory locations that the code can be stored. It can ei-ther stored in user-mode, which would generally makeit specific to a given process, or kernel-mode, whichwould make it visible system wide. Deciding which

10


12/28

of the two locations to use is a matter of determiningthe contextual restrictions of the function pointer be-

ing leveraged. For example, if the function pointer iscalled through at a raised IRQL, such as DISPATCH,then it is not possible to store the hook functionscode in pageable memory. Another example of a re-striction is the process context in which the functionpointer is used. If a function pointer may be calledthrough in any process context, then there are onlya finite number of locations that the code could beplaced in user-mode. Its important to understandsome of the specific locations that code may be storedin

Perhaps the most obvious location that can be used

to store code that is to execute in kernel-modeis the kernel pools, such as the PagedPool andNonPagedPool, which are used to store dynamicallyallocated memory. In some circumstances, it mayalso be possible to store code in regions of mem-ory that contain code or data associated with de-vice drivers. While these few examples illustrate thatthere is certainly no shortage of locations in which tostore code, there are a few locations in particular thatare worth calling out.

One such location is composed of a single phys-ical page that is shared between user-mode and

kernel-mode. This physical page is known asSharedUserData and it is mapped into user-mode asread-only and kernel-mode as read-write. The virtualaddress that this physical page is mapped at is staticin both user-mode (0x7ffe0000) and kernel-mode(0xffdf0000) on all versions of Windows NT+6.There is also plenty of unused memory within thepage that is allocated for SharedUserData. The factthat the mapping address is static makes it a usefullocation to store small amounts of code without need-ing to allocate additional storage from the paged ornon-paged pool[24].

Though the SharedUserData mapping is quite use-ful, there is actually an alternative location that canbe used to store code that is arguably more covert.

6The virtual mappings are no longer executable as of Win-

dows XP SP2. However, it is entirely possible for a backdoor

to alter these page permissions.

This approach involves overwriting a function pointerwith the address of some code from the virtual map-

ping of the native DLL, ntdll.dll. The native DLLis special in that it is the only DLL that is guaran-teed to be mapped into the context of every process,including the System process. It is also mapped atthe same base address in every process due to as-sumptions made by the Windows kernel. While theseare useful qualities, the best reason for using thentdll.dll mapping to store code is that doing somakes it possible to store code in a process-relativefashion. Understanding how this works in practicerequires some additional explanation.

The native DLL, ntdll.dll, is mapped into the ad-

dress space of the System process and subsequentprocesses during kernel and process initialization, re-spectively. This mapping is performed in kernel-mode by nt!PspMapSystemDll. One can observethe presence of this mapping in the context of theSystem process through a debugger as shown below.These same basic steps can be taken to confirm thatntdll.dll is mapped into other processes as well7 :

kd> !process 0 0 System

PROCESS 81291660 SessionId: none Cid: 0004

Peb: 00000000 ParentCid: 0000

DirBase: 00039000 ObjectTable: e1000a68

HandleCount: 256.

Image: System

kd> !process 81291660

PROCESS 81291660 SessionId: none Cid: 0004

Peb: 00000000 ParentCid: 0000

DirBase: 00039000 ObjectTable: e1000a68

HandleCount: 256.

Image: System

VadRoot 8128f288 Vads 4

...

kd> !vad 8128f288

VAD level start end commit

...

81207d98 ( 1) 7c900 7c9af 5 Mapped Exe

kd> dS poi(poi(81207d98+0x18)+0x24)+0x30

e13591a8 "\WINDOWS\system32\ntdll.dll"

To make use of the ntdll.dll mapping as a loca-tion in which to store code, one must understand the

7The command !vad is used to dump the virtual address

directory for a given process. This directory contains descrip-

tions of memory regions within a given process.

11


13/28

implications of altering the contents of the mappingitself. Like all other image mappings, the code pages

associated with ntdll.dll are marked as Copy-on-Write (COW) and are initially shared between allprocesses. When data is written to a page that hasbeen marked with COW, the kernel allocates a newphysical page and copies the contents of the sharedpage into the newly allocated page. This new physicalpage is then associated with the virtual page that isbeing written to. Any changes made to the new pageare observed only within the context of the processthat is making them. This behavior is why alteringthe contents of a mapping associated with an imagefile do not lead to changes appearing in all processcontexts.

Based on the ability to make process-relative changesto the ntdll.dll mapping, one is able to store codethat will only be used when a function pointer iscalled through in the context of a specific process.When not called in a specific process context, what-ever code exists in the default mapping of ntdll.dllwill be executed. In order to better understand howthis may work, it makes sense to walk through a con-crete example.

In this example, a rootkit has opted to createa backdoor by overwriting the function pointer

that is used when dispatching IRPs using theIRP MJ FLUSH BUFFERS major function for a specificdevice object. The prototype for the function thathandles IRP MJ FLUSH BUFFERS IRPs is shown below:

NTSTATUS DispatchFlushBuffers(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp);

In order to create a context-specific backdoor, therootkit has chosen to overwrite the function pointerdescribed above with an address that resides within

ntdll.dll. By default, the rootkit wants all pro-cesses except those that are aware of the back-door to simply have a no-operation occur whenIRP MJ FLUSH BUFFERS is sent to the device ob-

ject. For processes that are aware of the backdoor,the rootkit wants arbitrary code execution to oc-

cur in kernel-mode. To accomplish this, the func-tion pointer should be overwritten with an address

that resides in ntdll.dll that contains a ret 0x8instruction. This will simply cause invocations ofIRP MJ FLUSH BUFFERS to return (without complet-ing the IRP). The location of this ret 0x8 shouldbe in a portion of code that is rarely executed inuser-mode. For processes that wish to execute arbi-trary code in kernel-mode, its as simple as alteringthe code that exists at the address of the ret 0x8instruction. After altering the code, the process onlyneeds to issue an IRP MJ FLUSH BUFFERS through theFlushFileBuffers function on the affected deviceobject. The context-dependent execution of code ismade possible by the fact that, in most cases, IRPsare processed in the context of the requesting process.

The remainder of this subsection will describe specificfunction pointers that may be useful targets for useas backdoors. The authors have tried to cover someof the more intriguing examples of function pointersthat may be hooked. Still, it goes without saying thatthere are many more that have not been explicitlydescribed. The authors would be interested to hearabout additional function pointers that have uniqueand useful properties in the context of a local kernel-mode backdoor.

2.5.1 Import Address Table

The Import Address Table (IAT) of a PE image isused to store the absolute virtual addresses of func-tions that are imported from external PE images.When a PE image is mapped into virtual memory, thedynamic loader (in kernel-mode, this is ntoskrnl)takes care of populating the contents of the PE im-ages IAT based on the actual virtual address loca-tions of dependent functions8. The compiler, in turn,generates code that uses an indirect call instruction

to invoke imported functions. Each imported func-tion has a function pointer slot in the IAT. In thisfashion, PE images do not need to have any precon-ceived knowledge of where dependent PE images are

8For the sake of simplicity, bound imports are excluded

from this explanation

12


14/28

going to be mapped in virtual memory. Instead, thisknowledge can be postponed until a runtime deter-

mination is made.

The fundamental step involved in hooking an IATentry really just boils down to changing a func-tion pointer. What distinguishes an IAT hook fromother types of function pointer hooks is the contextin which the overwritten function pointer is calledthrough. Since each PE image has their own IAT,any hook that is made to a given IAT will implic-itly only affect the associated PE image. For exam-ple, consider a situation where both foo.sys andbar.sys import ExAllocatePoolWithTag. If theIAT entry for ExAllocatePoolWithTag is hooked in

foo.sys, only those calls made from within foo.systo ExAllocatePoolWithTag will be affected. Callsmade to the same function from within bar.sys willbe unaffected. This type of limitation can actuallybe a good thing, depending on the underlying moti-vations for a given backdoor.

Category: Type I; may legitimately be modified,but should point to expected values.

Origin: The origin of the first IAT hook is unclear.In January, 2000, Silvio described hooking via theELF PLT which is, in some aspects, functionally

equivalent to the IAT in PE images[35].Capabilities: Kernel-mode code execution

Considerations: Assuming the calling restrictionsof an IAT hook are acceptable for a given backdoor,there are no additional considerations that need tobe made.

Covertness: It is possible for modern tools to detectIAT hooks by analyzing the contents of the IAT ofeach PE image loaded in kernel-mode. To detect dis-crepancies, a tool need only check to see if the virtualaddress associated with each function in the IAT is

indeed the same virtual address as exported by thePE image that contains a dependent function.

2.5.2 KiDebugRoutine

The Windows kernel provides an extensive debugginginterface to allow the kernel itself (and third partydrivers) to be debugged in a live, interactive envi-ronment (as opposed to after-the-fact, post-mortemcrash dump debugging). This debugging interfaceis used by a kernel debugger program (kd.exe, orWinDbg.exe) in order to perform tasks such as theinspecting the running state (including memory, reg-isters, kernel state such as processes and threads, andthe like) of the kernel on-demand. The debugginginterface also provides facilities for the kernel to re-port various events of interest to a kernel debugger,

such as exceptions, module load events, debug printoutput, and a handful of other state transitions. Asa result, the kernel debugger interface has hooksbuilt-in to various parts of the kernel for the purposeof notifying the kernel debugger of these events.

The far-reaching capabilities of the kernel debuggerin combination with the fact that the kernel debug-ger interface is (in general) present in a compatiblefashion across all OS builds provides an attractivemechanism that can be used to gain control of a sys-tem. By subverting KiDebugRoutine to instead pointto a custom callback function, it becomes possible to

surepticiously gain control at key moments (debugprints, exception dispatching, kernel module loadingare the primary candidates).

The architecture of the kernel debugger event noti-fication interface can be summed up in terms of aglobal function pointer (KiDebugRoutine) in the ker-nel. A number distinct pieces of code, such as theexception dispatcher, module loader, and so on aredesigned to call through KiDebugRoutine in order tonotify the kernel debugger of events. In order to mini-mize overhead in scenarios where the kernel debuggeris inactive, KiDebugRoutine is typically set to point

to a dummy function, KdpStub, which performs al-most no actions and, for the most part, simply re-turns immediately to the caller. However, when thesystem is booted with the kernel debugger enabled,KiDebugRoutine may be set to an alternate function,KdpTrap, which passes the information supplied by

13


15/28

the caller to the remote debugger.

Although enabling or disabling the kernel debug-ger has traditionally been a boot-time-only decision,newer OS builds such as Windows Server 2003 andbeyond have some support for transitioning a systemfrom a kernel debugger inactive state to a kerneldebugger active state. As a result, there is someadditional logic now baked into the dummy routine(KdpStub) which can under some circumstances re-sult in the debugger being activated on-demand. Thisresults in control being passed to the actual debug-ger communication routine (KdpTrap) after an on-demand kernel debugger initialization. Thus, in somecircumstances, KdpStub will pass control through to

KdpTrap.

Additionally, in Windows Server 2003 and later, itis possible to disable the kernel debugger on the fly.This may result in KiDebugRoutine being changedto refer to KdpStub instead of the boot-time-assignedKdpTrap. This behavior, combined with the previ-ous points, is meant to show that provided a sys-tem is booted with the kernel debugger enabled itmay not be enough to just enforce a policy thatKiDebugRoutine must not change throughout thelifetime of the system.

Aside from exception dispatching notifiations, mostdebug events find their way to KiDebugRoutinevia interrupt 0x2d, otherwise known as DebugSer-vice. This includes user-mode debug print eventsas well as kernel mode originated events (such askernel module load events). The trap handlerfor interrupt 0x2d packages the information sup-plied to the debug service interrupt into the for-mat of a special exception that is then dispatchedvia KiExceptionDispatch (the normal exceptiondispatcher path for interrupt-generated exceptions).This in turn leads to KiDebugRoutine being called asa normal part of the exception dispatchers operation.

Category: Type IIa, varies. Although on previousOS versions KiDebugRoutine was essentially write-once, recent versions allow limited changes of thisvalue on the fly while the system is booted.

Origin: At the time of this writing, the au-thors are not aware of existing malware using

KiDebugRoutine.

Capabilities: Redirecting KiDebugRoutine to pointto a caller-controlled location allows control to begained during exception dispatching (a very commonoccurrence), as well as certain other circumstances(such as module loading and debug print output).As an added bonus, because KiDebugRoutine is in-tegral to the operation of the kernel debugger facilityas a whole, it should be possible to filter the eventsreceived by the kernel debugger by manipulation ofwhich events are actually passed on to KdpTrap, ifa kernel debugger is enabled. However, it should be

noted that other steps would need to be taken to pre-vent a kernel debugger from detecting the presence ofcode, such as the interception of the kernel debuggerread-memory facilities.

Considerations: Depending on how the systemglobal flags (NtGlobalFlag) are configured, andwhether the system was booted in such a way asto suppress notification of user mode exceptions tothe kernel debugger, exception events may not al-ways be delivered to KiDebugRoutine. Also, asKiDebugRoutine is not exported, it would be neces-sary to locate it in order to intercept it. Furthermore,

many of the debugger events occur in an arbitrarycontext, such that pointing KiDebugRoutine to usermode (except within ntdll space) may be considereddangerous. Even while pointing KiDebugRoutineto ntdll, there is the risk that the system may bebrought down as some debugger events may be re-ported while the system cannot tolerate paging (e.g.debug prints). From a thread-safety perspective, aninterlocked exchange on KiDebugRoutine should bea relatively synchronization-safe operation (howeverthe new callback routine may never be unmappedfrom the address space without some means of en-suring that no callbacks are active).

Covertness: As KiDebugRoutine is a non-exported,writable kernel global, it has some inherent defensesagainst simple detection techniques. However, in le-gitimate system operation, there are only two legalvalues for KiDebugRoutine: KdpStub, and KdpTrap.

14


16/28

Though both of these routines are not exported, acombination of detection techniques (such as veri-

fying the integrity of read only kernel code, and averification that KiDebugRoutine refers to a locationwithin an expected code region of the kernel mem-ory image) may make it easier to locate blatant at-tacks on KiDebugRoutine. For example, simply set-ting KiDebugRoutine to point to an out-of-kernel lo-cation could be detected with such an approach, ascould pointing it elsewhere in the kernel and thenwriting to it (either the target location would need tobe outside the normal code region, easily detectable,or normally read-only code would have to be over-written, also relatively easily detectable). Also, allversions of PatchGuard protect KiDebugRoutine inx64 versions of Windows. This means that effectiveexploitation of KiDebugRoutine in the long term onsuch systems would require an attacker to deal withPatchGuard. This is considered a relatively minordifficulty by the authors.

2.5.3 KTHREADs SuspendApc

In order to support thread suspension, the Windowskernel includes a KAPC field named SuspendApc inthe KTHREAD structure that is associated with each

thread running on a system. When thread suspen-sion is requested, the kernel takes steps to queue theSuspendApc structure to the threads APC queue.When the APC queue is processed, the kernel invokesthe APCs NormalRoutine, which is typically initial-ized to nt!KiSuspendThread, from the SuspendApcstructure in the context of the thread that is beingsuspended. Once nt!KiSuspendThread completes,the thread is suspended. The following shows whatvalues the SuspendApc is typically initialized to:

kd> dt -r1 _KTHREAD 80558c20

...

+0x16c SuspendApc : _KAPC+0x000 Type : 18

+0x002 Size : 48

+0x004 Spare0 : 0

+ 0x008 Thr ead : 0 x8055 8c20 _KTHR EAD

+0x00c ApcListEntry : _LIST_ENTRY [ 0x0 - 0x0 ]

+0x014 KernelRoutine : 0x804fa8a1 nt!KiSuspendNop

+0x018 RundownRoutine : 0x805139ed nt!PopAttribNop

+0x01c NormalRoutine : 0x804fa881 nt!KiSuspendThread

+0x020 NormalContext : (null)+0x024 SystemArgument1: (null)

+0x028 SystemArgument2: (null)

+0x02c ApcStateIndex : 0

+0x02d ApcMode : 0

+0x02e Inserted : 0

Since the SuspendApc structure is specific to agiven KTHREAD, any modification made to a threadsSuspendApc.NormalRoutine will affect only thatspecific thread. By modifying the NormalRoutineof the SuspendApc associated with a given thread, abackdoor can gain arbitrary code execution in kernel-

mode by simply attempting to suspend the thread.It is trivial for a user-mode application to trigger thebackdoor. The following sample code illustrates howa thread might execute arbitrary code in kernel-modeif its SuspendApc has been modified:

SuspendThread(GetCurrentThread());

The following code gives an example of assembly thatimplements the technique described above taking intoaccount the InitialStack insight described in theconsiderations below:

public _RkSetSuspendApcNormalRoutine@4

_RkSetSuspendApcNormalRoutine@4 proc

assume fs:nothing

push edi

push esi

; Grab the current thread pointer

xor ecx , ecx

inc ch

mov esi, fs:[ecx+24h]

; Grab KTHREAD.InitialStack

lea esi, [esi+18h]

lodsd

xchg esi, edi

; Find StackBase

repne scasd

; Set KTHREAD->SuspendApc.NormalRoutine mov eax, [esp+0ch]

xchg eax, [edi+1ch]

pop esi

pop edi

ret

_RkSetSuspendApcNormalRoutine@4 endp

15


17/28

Category: Type IIa

Origin: The authors believe this to be the first pub-lic description of this technique. Skywing is creditedwith the idea. Greg Hoglund mentions abusing APCqueues to execute code, but he does not explicitly callout SuspendApc[18].


Considerations: This technique is extremely effec-tive. It provides a simple way of executing arbitrarycode in kernel-mode by simply hijacking the mecha-nism used to suspend a specific thread. There are alsosome interesting side effects that are worth mention-ing. Overwriting the SuspendApcs NormalRoutinemakes it so that the thread can no longer be sus-pended. Even better, if the hook function thatreplaces the NormalRoutine never returns, it be-comes impossible for the thread, and thus the own-ing process, to be killed because of the fact that theNormalRoutine is invoked at APC level. Both ofthese side effects are valuable in the context of arootkit.

One consideration that must be made from the per-spective of a backdoor is that it will be neces-sary to devise a technique that can be used to lo-cate the SuspendApc field in the KTHREAD structureacross multiple versions of Windows. Fortunately,there are heuristics that can be used to accomplishthis. In all versions of Windows analyzed thus far,the SuspendApc field is preceded by the StackBasefield. It has been confirmed on multiple operatingsystems that the StackBase field is equal to theInitialStack field. The InitialStack field is lo-cated at a reliable offset (0x18) on all versions ofWindows checked by the authors. Using this knowl-edge, it is trivial to write some code that scans theKTHREAD structure on pointer aligned offsets until itencounters a value that is equal to the InitialStack.

Once a match is found, it is possible to assume thatthe SuspendApc immediately follows it.

Covertness: This technique involves overwriting afunction pointer in a dynamically allocated regionof memory that is associated with a specific thread.

This makes the technique fairly covert, but not im-possible to detect. One method of detecting this tech-

nique would be to enumerate the threads in each pro-cess to see if the NormalRoutine of the SuspendApcis set to the expected value of nt!KiSuspendThread.It would be challenging for someone other than Mi-crosoft to implement this safely. The authors are notaware of any tool that currently does this.

2.5.4 Create Thread Notify Routine

The Windows kernel provides drivers withthe ability to register a callback that will be

notified when threads are created and ter-minated. This ability is provided throughthe Windows Driver Model (WDM) exportnt!PsSetCreateThreadNotifyRoutine. Whena thread is created or terminated, the kernel enu-merates the list of registered callbacks and notifiesthem of the event.

Category: Type II

Origin: The ability to register a callback that is no-tified when threads are created and terminated hasbeen included since the first release of the WDM.


Considerations: This technique is useful because auser-mode process can control the invocation of thecallback by simply creating or terminating a thread.Additionally, the callback will be notified in the con-text of the process that is creating or terminatingthe thread. This makes it possible to set the callbackroutine to an address that resides within ntdll.dll.

Covertness: This technique is covert in that it ispossible for a backdoor to blend in with any otherregistered callbacks. Without having a known-good

state to compare against, it would be challenging toconclusively state that a registered callback is associ-ated with a backdoor. There are some indicators thatcould be used that something is odd, such as if thecallback routine resides in ntdll.dll or if it residesin either the paged or non-paged pool.

16


18/28

2.5.5 Object Type Initializers

The Windows NT kernel uses an object-oriented ap-proach to representing resources such as files, drivers,devices, processes, threads, and so on. Each ob-

ject is categorized by an object type. This objecttype categorization provides a way for the kernel tosupport common actions that should be applied toobjects of the same type, among other things. Un-der this design, each object is associated with onlyone object type. For example, process objects areassociated with the nt!PsProcessType object type.The structure used to represent an object type is theOBJECT TYPE structure which contains a nested struc-

ture named OBJECT TYPE INITIALIZER. Its this sec-ond structure that provides some particularly inter-esting fields that can be used in a backdoor.

As one might expect, the fields of most interest arefunction pointers. These function pointers, if non-null, are called by the kernel at certain points duringthe lifetime of an object that is associated with a par-ticular object type. The following debugger outputshows the function pointer fields:

kd> dt nt!_OBJECT_TYPE_INITIALIZER

...

+0x02c DumpProcedure : Ptr32

+0x030 OpenProcedure : Ptr32+0x034 CloseProcedure : Ptr32

+0x038 DeleteProcedure : Ptr32

+0x03c ParseProcedure : Ptr32

+0x040 SecurityProcedure : Ptr32

+0x044 QueryNameProcedure : Ptr32

+0x048 OkayToCloseProcedure : Ptr32

Two fairly easy to understand procedures areOpenProcedure and CloseProcedure. These func-tion pointers are called when an object of a giventype is opened and closed, respectively. This givesthe object type initializer a chance to perform somecommon operation on an instance of an object type.

In the case of a backdoor, this exposes a mechanismthrough which arbitrary code could be executed inkernel-mode whenever an object of a given type isopened or closed.

Category: Type IIa

Origin: Matt Conover gave an excellent presentationon how object type initializers can be used to detect

rootkits at XCon 2005[8]. Conversely, they can alsobe used to backdoor the system. The authors arenot aware of public examples prior to Conovers pre-sentation. Greg Hoglund also mentions this type ofapproach[18] in June, 2006.


Considerations: There are no unique considera-tions involved in the use of this technique.

Covertness: This technique can be detected by toolsdesigned to validate the state of object type initial-izers against a known-good state. Currently, the au-thors are not aware of any tools that perform thistype of check.

2.5.6 PsInvertedFunctionTable

With the introduction of Windows for x64, significantchanges were made to how exceptions are processedwith respect to how exceptions operate in x86 ver-sions of Windows. On x86 versions of Windows, ex-ception handlers were essentially demand-registeredat runtime by routines with exception handlers (more

of a code-based exception registration mechanism).On x64 versions of Windows, the exception registra-tion path is accomplished using a more data-drivenmodel. Specifically, exception handling (and espe-cially unwind handling) is now driven by metadataattached to each PE image (known as the exceptiondirectory), which describes the relationship betweenroutines and their exception handlers, what the ex-ception handler function pointer(s) for each regionof a routine are, and how to unwind each routinesmachine state in a completely data-driven fashion.

While there are significant advantages to having ex-

ception and unwind dispatching accomplished using adata-driven model, there is a potential performancepenalty over the x86 method (which consisted of alinked list of exception and unwind handlers regis-tered at a known location, on a per-thread basis). Aspecific example of this can be seen when noting that

17


19/28

all of the information needed for the operating systemto locate and call the exception handler for purposes

of exception or unwind processing was in one location(the linked list in the NT TIB) on Windows for x86 isnow scattered across all loaded modules in Windowsfor x64. In order to locate an exception handler for aparticular routine, it is necessary to search the loadedmodule list for the module that contains the instruc-tion pointer corresponding to the function in ques-tion. After the module is located, it is then neces-sary to process the PE header of the module to locatethe modules exception directory. Finally, it is thennecessary to search the exception directory of thatmodule for the metadata corresponding to a locationencompassing the requested instruction pointer. Thisprocess must be repeated for every function for whichan exception may traverse.

In an effort to improve the performance of exceptiondispatching on Windows for x64, Microsoft developeda multi-tier cache system that speeds the resolutionof exception dispatching information that is usedby the routine responsible for looking up metadataassociated with a function. The routine responsiblefor this is named RtlLookupFunctionTable. Whensearching for unwind information (a pointer to aRUNTIME FUNCTION entry structure), depending onthe reason for the search request, an internal first-level cache (RtlpUnwindHistoryTable) of unwindinformation for commonly occurring functions maybe searched. At the time of this writing, this tableconsists of RtlUnwindex, C specific handler,RtlpExecuteHandlerForException,RtlDispatchException, RtlRaiseStatus,KiDispatchException, andKiExceptionDispatch. Due t o how exceptiondispatching operates on x64 [39], many of these func-tions will commonly appear in any exception callstack. Because of this it is beneficial to performanceto have a first-level, quick reference for them.

After RtlpUnwindHistoryTable is searched, a sec-ond cache, known as PsInvertedFunctionTable(in kernel-mode) or LdrpInvertedFunctionTable(in user-mode) is scanned. This second-levelcache contains a list of the first 0x200 (Windows

Server 2008, Windows Vista) or 0xA0 (WindowsServer 2003) loaded modules. The loaded mod-

ule list contained within PsInvertedFunctionTable/ LdrpInvertedFunctionTable is presented as aquickly searchable, unsorted linear array that mapsthe memory occupied by an entire loaded image toa given modules exception directory. The lookupthrough the inverted function table thus eliminatesthe costly linked list (loaded module list) and ex-ecutable header parsing steps necessary to locatethe exception directory for a module. For moduleswhich are referenced by PsInvertedFunctionTable/ LdrpInvertedFunctionTable, the exception direc-tory pointer and size information in the PE header ofthe module in question are unused after the module isloaded and the inverted function table is populated.Because the inverted function table has a fixed size, ifenough modules are loaded simultaneously, it is pos-sible that after a point some modules may need to bescanned via loaded module list lookup if all entries inthe inverted function table are in use when that mod-ule is loaded. However, this is a rare occurrence, andmost of the interesting system modules (such as HALand the kernel memory image itself) are at a fixed-at-boot position within PsInvertedFunctionTable[37].

By redirecting the exception directory pointer inPsInvertedFunctionTable to refer to a shadowexception directory in caller-supplied memory (out-side of the PE header of the actual module), it is pos-sible to change the exception (or unwind) handlingbehavior of all code points within a module. For in-stance, it is possible to create an exception handlerspanning every code byte within a module throughmanipulation of the exception directory information.By changing the inverted function table cache for amodule, multiple benefits are realized with respect tothis goal. First, an arbitrarily large amount of spacemay be devoted to unwind metadata, as the patchedunwind metadata need not fit within the confines of

a particular images exception directory (this is par-ticular important if one wishes to gift all functionswithin a module with an exception handler). Sec-ond, the memory image of the module in questionneed not be modified, improving the resiliency of thetechnique against naive detection systems.

18


20/28

Category: Type IIa, varies. Although the entriesfor always-loaded modules such as the HAL and the

kernel in-memory image itself are essentially consid-ered write-once, the array as a whole may be modifiedas the system is running when kernel modules are ei-ther loaded or unloaded. As a result, while the firstfew entries of PsInvertedFunctionTable are com-paratively easy to verify, the dynamic entries cor-responding to demand-loaded (and possibly demand-unloaded) kernel modules may frequently change dur-ing the legitimate operation of the system, and assuch interception of the exception directory pointersof individual drivers may be much less simple to de-tect than the interception of the kernels exceptiondirectory.

Origin: At the time of this writing, the au-thors are not aware of existing malware us-ing PsInvertedFunctionTable. Hijacking of PsInvertedFunctionTable was proposed as a pos-sible bypass avenue for PatchGuard version 2 bySkywing[37]. Its applicability as a possible attackvector with respect to hiding kernel mode code wasalso briefly described in the same article.

Capabilities: The principal capability afforded bythis technique is to establish an exception handlerat arbitrary locations within a target module (even

every code byte within a module if so desired). Byvirtue of creating such exception handlers, it is pos-sible to gain control at any location within a modulethat may be traversed by an exception, even if the ex-ception would normally be handled in a safe fashionby the module or a caller of the module.

Considerations: As PsInvertedFunctionTable isnot exported, one must first locate it in order to patchit (this is considered possible as many exported rou-tines reference it in an obvious, patterned way, suchas RtlLookupFunctionEntry. Also, although thestructure is guarded by a non-exported synchroniza-

tion mechanism (PsLoadedModuleSpinLock in Win-dows Server 2008), the first few entries correspondingto the HAL and the kernel in-memory image itselfshould be static and safely accessible without syn-chronization (after all, neither the HAL nor the ker-nel in-memory image may be unloaded after the sys-

tem has booted). It should be possible to performan interlocked exchange to swap the exception direc-

tory pointer, provided that the exception directoryshall not be modified in a fashion that would requiresynchronization (e.g. only appended to) after the ex-change is made. The size of the exception directory issupplied as a separate value in the inverted functiontable entry array and would need to be modified sep-arately, which may pose a synchronization problemif alterations to the exception directory are not care-fully planned to be safe in all possible contingencieswith respect to concurrent access as the alterationsare made. Additionally, due to the 32-bit RVA basedformat of the unwind metadata, all exception han-dlers for a module must be within 4GB of that mod-ules loaded base address. This means that customexception handlers need to be located within a win-dow of memory that is relatively near to a module.Allocating memory at a specific base address involvesadditional work as the memory cannot be in an arbi-trary point in the address space, but within 4GB ofthe target. If a caller can query the address space andrequest allocations based at a particular region, how-ever, this is not seen as a particular unsurmountableproblem.

Covertness: The principal advantage of this ap-proach is that it allows a caller to gain control at anypoint within a modules execution where an excep-tion is generated without modifying any code or datawithin the module in question (provided the mod-ule is cached within PsInvertedFunctionTable).Because the exception directory information for amodule is unused after the cache is populated,integrity checks against the PE header are use-less for detecting the alteration of exception han-dling behavior for a cached module. Addition-ally, PsInvertedFunctionTable is a non-exported,writable kernel-mode global which affords it someintrinsic protection against simple detection tech-

niques. A scan of the loaded module list and com-parison of exception directory pointers to those con-tained within PsInvertedFunctionTable could re-veal most attacks of this nature, however, providedthat the loaded module list retains integrity. Ad-ditionally, PatchGuard version 3 appears to guard

19


21/28

key portions of PsInvertedFunctionTable (e.g. toblock redirection of the kernels exception directory),

resulting in a need to bypass PatchGuard for long-term exploitation on Windows x64 based systems.This is considered a relatively minor difficulty by theauthors.

2.5.7 Delayed Procedures

There are a number of features offered by theWindows kernel that allow device drivers to asyn-chronously execute code. Some examples of these fea-tures include asynchronous procedure calls (APCs),

deferred procedure calls (DPCs), work items, thread-ing, and so on. A backdoor can simply make useof the APIs exposed by the kernel to make use ofany number of these to schedule a task that willrun arbitrary code in kernel-mode. For example, abackdoor might queue a kernel-mode APC using thentdll.dll trick described at the beginning of thissection. When the APC executes, it runs code thathas been altered in ntdll.dll in a kernel-mode con-text. This same basic concept would work for allother delayed procedures.

Category: Type II

Origin: This technique makes implicit use of op-erating system exposed features and therefore fallsinto the category of obvious. Greg Hoglund mentionsthese in particular in June, 2006[18].


Considerations: The important consideration hereis that some of the methods that support runningdelayed procedures have restrictions about where thecode pages reside. For example, a DPC is invoked atdispatch level and must therefore execute code thatresides in non-paged memory.

Covertness: This technique is covert in the sensethat the backdoor is always in a transient state ofexecution and therefore could be considered largelydormant. Since the backdoor state is stored alongsideother transient state in the operating system, this

technique should prove more difficult to detect whencompared to some of the other approaches described

in this paper.

2.6 Asynchronous Read Loop

Its not always necessary to hook some portion of thekernel when attempting to implement a local kernel-mode backdoor. In some cases, its easiest to justmake use of features included in the target operat-ing system to blend in with normal behavior. Oneparticularly good candidate for this involves abusingsome of the features offered by Windows I/O (in-put/output) manager.

The I/O model used by Windows has many facets toit. For the purposes of this paper, its only necessaryto have an understanding of how it operates whenreading data from a file. To support this, the ker-nel constructs an I/O Request Packet (IRP) with itsMajorFunction set to IRP MJ READ. The kernel thenpasses the populated IRP down to the device objectthat is related to the file that is being read from. Thetarget device object takes the steps needed to readdata from the underlying device and then stores theacquired data in a buffer associated with the IRP.Once the read operation has completed, the kernelwill call the IRPs completion routine if one has beenset. This gives the original caller an opportunity tomake forward progress with the data that has beenread.

This very basic behavior can be effectively harnessedin the context of a backdoor in a fairly covert fash-ion. One interesting approach involves a user-modeprocess hosting a named pipe server and a blob ofkernel-mode code reading data from the server andthen executing it in the kernel-mode context. Thisgeneral behavior would make it possible to run ad-

ditional code in the kernel-mode context by simplyshuttling it across a named pipe. The specifics ofhow this can be made to work are almost as simpleas the steps described in the previous paragraph.

The user-mode part is simple; create a named pipe

20


22/28

server using CreateNamedPipe and then wait for aconnection. The kernel-mode part is more interest-

ing. One basic idea might involve having a kernel-mode routine that builds an asynchronous read IRPwhere the IRPs completion routine is defined as thekernel-mode routine itself. In this way, when dataarrives from the user-mode process, the routine is no-tified and given an opportunity to execute the codethat was supplied. After the code has been executed,it can simply re-use the code that was needed to passthe IRP to the underlying device associated with thenamed pipe that its interacting with. The follow-ing pseudo-code illustrates how this could be accom-plished:

KernelRoutine(DeviceObject, ReadIrp, Context)

{

// First time called, ReadIrp == NULL

if (ReadIrp == NULL)

{

FileObject = OpenNamedPipe(...)

}

// Otherwise, called during IRP completion

else

{

FileObject = GetFileObjectFromIrp(ReadIrp)

RunCodeFromIrpBuffer(ReadIrp)

}

DeviceObject = IoGetRelatedDeviceObject(FileObject)

ReadIrp = IoBuildAsynchronousFsdRequest(...)IoSetCompletionRoutine(Re adIrp, KernelRoutine)

IoCallDriver(DeviceObject, ReadIrp)

}

Category: Type II

Origin: The authors believe this to be the first publicdescription of this technique.


Covertness: The authors believe this technique tobe fairly covert due to the fact that the kernel-mode

code profile is extremely minimal. The only codethat must be present at all times is the code neededto execute the read buffer and then post the nextread IRP to the target device object. There are twomain strategies that might be taken to detect thistechnique. The first could include identifying mali-

cious instances of the target device, such as a mali-cious named pipe server. The second might involve

attempting to perform an in-memory fingerprint ofthe completion routine code, though this would befar from fool proof, especially if the kernel-mode codeis encoded until invoked.

2.7 Leaking CS

With the introduction of protected mode into the x86architecture, the concept of separate privilege levels,or rings, was born. Lesser privileged rings (such asring 3) were designed to be restricted from accessing

resources associated with more privileged rings (suchas ring 0). To support this concept, segment descrip-tors are able to define access restrictions based onwhich rings should be allowed to access a given re-gion of memory. The processor derives the CurrentPrivilege Level (CPL) by looking at the low order twobits of the CS segment selector when it is loaded. Ifall bits are cleared, the processor is running at ring0, the most privileged ring. If all bits are set, thenprocessor is running at ring 3, the least privilegedring.

When certain events occur that require the operating

systems kernel to take control, such as an interrupt,the processor automatically transitions from what-ever ring it is currently executing at to ring 0 so thatthe request may be serviced by the kernel. As part ofthis transition, the processor saves the value of the anumber of different registers, including the previousvalue of CS, to the stack in order to make it possibleto pick up execution where it left off after the requesthas been serviced. The following structure describesthe order in which these registers are saved on thestack:

typedef struct _SAVED_STATE

{ULONG_PTR Eip;

ULONG_PTR CodeSelector;

ULONG Eflags;

ULONG_PTR Esp;

ULONG_PTR StackSelector;

} SAVED_STATE, *PSAVED_STATE

21


23/28

Potential security implications may arise if there is acondition where some code can alter the saved execu-

tion state in such a way that the saved CS is modifiedfrom a lesser privileged CS to a more privileged CSby clearing the low order bits. When the saved ex-ecution state is used to restore the active processorstate, such as through an iret, the original callerimmediately obtains ring 0 privileges.

Category: Undefined; this approach does not fit intoany of the defined categories as it simply takes advan-tage of hardware behavior relating around how CS isused to determine the CPL of a processor. If codepatching is used to be able to modify the saved CS,then the implementation is Type I.

Origin: Leaking CS to user-mode has been knownto be dangerous since the introduction of protectedmode (and thus rings) into the x86 architecture withthe 80286 in 1982[22]. This approach therefore fallsinto the category of obvious due to the documentedhardware implications of leaking a kernel-mode CSwhen transitioning back to user-mode.


Considerations: Leaking the kernel-mode CS touser-mode may have undesired consequences. What-ever code is to be called in user-mode must take intoaccount that it will be running in a kernel-mode con-text. Furthermore, the kernel attempts to be as rigor-ous as possible about checking to ensure that a threadexecuting in user-mode is not allowed a kernel-modeCS.

Covertness: Depending on the method used tointercept and alter the saved execution state, thismethod has the potential to be fairly covert. If themethod involves secondary hooking in order to mod-ify the state, then it may be detected through someof the same techniques as described in the section onimage patching.

3 Prevention & Mitigation

The primary purpose of this paper is not to explicitlyidentify approaches that could be taken to preventor mitigate the different types of attacks describedherein. However, it is worth taking some time todescribe the virtues of certain approaches that couldbe extremely beneficial if one were to attempt to doso. The subject of preventing backdoors from beinginstalled and persisted is discussed in more detail insection 4 and therefore wont be considered in thissection.

One of the more interesting ideas that could be ap-

plied to prevent a number of different types of back-doors would be immutable memory. Memory is im-mutable when it is not allowed to be modified. Thereare a few key regions of memory used by the Windowskernel that would benefit greatly from immutablememory, such as executable code segments and re-gions that are effectively write-once, such as theSSDT. While immutable memory way work in prin-ciple, there is currently no x86 or x64 hardware (thatthe authors are aware of) that permits this level ofcontrol.

Even though there appears to be no hardware sup-

port for this, it is still possible to implement im-mutable memory in a virtualized environment. Thisis especially true in hardware-assisted virtualizationimplementations that make use of a hypervisor insome form. In this model, a hypervisor can easily ex-pose a hypercall (similar to a system call, but trapsinto the hypervisor) that would allow an enlightenedguest to mark a set of pages as being immutable.From that point forward, the hypervisor would re-strict all writes to the pages associated with the im-mutable region.

As mentioned previously, particularly good candi-

dates for immutable memory are things like theSSDT, Windows ALMOSTRO write-once segment, aswell as other single-modification data elements thatexist within the kernel. Enforcing immutable mem-ory on these regions would effectively prevent back-doors from being able to establish certain types of

22


24/28

hooks. The downside to it would be that the kernelwould lose the ability to hot-patch itself9. Still, the

security upside would seem to out-weigh the poten-tial downside. On x64, the use of immutable memorywould improve the resilience of PatchGuard by allow-ing it to actively prevent hot-patching rather thanrelying on detecting it with the use of a polling cycle.

4 Running Code in Kernel-

Mode

There are many who might argue that its not evennecessary to write code that prevents or detects spe-cific types of kernel-mode backdoors. This argumentcan be made on the grounds of two very specificpoints. The first point is that in order for one tobackdoor the kernel, one must have some way of exe-cuting code in kernel-mode. Based on this line of rea-soning, one might argue that the focus should insteadbe given to preventing untrusted code from runningin kernel-mode. The second point in this argumentis that in order for one to truly compromise the host,some form of data must be persisted. If this is as-sumed to be the case, then an obvious solution would

be to identify ways of preventing or detecting thepersistent data. While there may also be additionalpoints, these two represent the common themes ob-served by the authors. Unfortunately, the fact is thatboth of these points are, at the time of this writing,flawed.

It is currently not possible with present day operat-ing systems and x86/x64 hardware to guarantee thatonly specific code will run in the context of an op-erating systems kernel. Though Microsoft wishes itwere possible, which is clearly illustrated by their ef-forts in Code Integrity and Trusted Boot, there is noreal way to guarantee th

uninformed v8a2

Documents