06297976

Upload: musthafa-kadersha

Post on 02-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 06297976

    1/13

    Unprivileged Black-Box Detectionof User-Space Keyloggers

    Stefano Ortolani, Cristiano Giuffrida, and Bruno Crispo, Senior Member, IEEE

    AbstractSoftware keyloggers are a fast growing class of invasive software often used to harvest confidential information. One of the

    main reasons for this rapid growth is the possibility for unprivileged programs running in user space to eavesdrop and record all the

    keystrokes typed by the users of a system. The ability to run in unprivileged mode facilitates their implementation and distribution, but,

    at the same time, allows one to understand and model their behavior in detail. Leveraging this characteristic, we propose a new

    detection technique that simulates carefully crafted keystroke sequences in input and observes the behavior of the keylogger in output

    to unambiguously identify it among all the running processes. We have prototyped our technique as an unprivileged application, hence

    matching the same ease of deployment of a keylogger executing in unprivileged mode. We have successfully evaluated the underlying

    technique against the most common free keyloggers. This confirms the viability of our approach in practical scenarios. We have also

    devised potential evasion techniques that may be adopted to circumvent our approach and proposed a heuristic to strengthen the

    effectiveness of our solution against more elaborated attacks. Extensive experimental results confirm that our technique is robust to

    both false positives and false negatives in realistic settings.

    Index TermsInvasive software, keylogger, security, black-box, PCC

    1 INTRODUCTION

    KEYLOGGERS are implanted on a machine to intentionallymonitor the user activity by logging keystrokes andeventually delivering them to a third party [1]. While theyare seldom used for legitimate purposes (e.g., surveil-lance/parental monitoring infrastructures), keyloggers areoften maliciously exploited by attackers to steal confiden-tial information. Many credit card numbers and pass-words have been stolen using keyloggers [2], [3], whichmakes them one of the most dangerous types of spywareknown to date.

    Keyloggers can be implemented as tiny hardware devicesor more conveniently in software. Software-based key-loggers can be further classified based on the privileges theyrequire to execute. Keyloggers implemented by a kernelmodule run with full privileges in kernel space. Conversely,a fully unprivileged keylogger can be implemented by asimple user-space process. It is important to notice that auser-space keylogger can easily rely on documented sets ofunprivileged APIs commonly available on modern operat-ing systems (OSs). This is not the case for a keylogger

    implemented as a kernel module. In kernel space, theprogrammer must rely on kernel-level facilities to interceptall the messages dispatched by the keyboard driver,undoubtedly requiring a considerable effort and knowledgefor an effective and bug-free implementation. Furthermore,a keylogger implemented as a user-space process is mucheasier to deploy since no special permission is required. A

    user can erroneously regard the keylogger as a harmlesspiece of software and being deceived in executing it. Onthe contrary, kernel-space keyloggers require a user withsuperuser privileges to consciously install and executeunsigned code within the kernel, a practice often forbiddenby modern operating systems such Windows Vista orWindows 7.

    In light of these observations, it is no surprise that95 percent of the existing keyloggers run in user space [4].Despite the rapid growth of keylogger-based frauds(i.e., identity theft, password leakage, etc.), not manyeffective and efficient solutions have been proposed toaddress this problem. Traditional defense mechanisms usefingerprinting strategies similar to those used to detectviruses and worms. Unfortunately, this strategy is hardlyeffective against the vast number of new keylogger variantssurfacing every day in the wild.

    In this paper, we propose a new approach to detectkeyloggers running as unprivileged user-space processes.To match the same deployment model, our technique is

    entirely implemented in an unprivileged process. As aresult, our solution is portable, unintrusive, easy to install,and yet very effective. In addition, the proposed detectiontechnique is completely black-box, i.e., based on behavioralcharacteristics common to all keyloggers. In other words,our technique does not rely on the internal structure of thekeylogger or the particular set of APIs used. For this reason,our solution is of general applicability. We have prototypedour approach and evaluated it against the most commonfree keyloggers [5]. Our approach has proven effective in allthe cases. We have also evaluated the impact of falsepositives in practical scenarios. In the final part of thispaper, we further validate our approach with a homegrown

    keylogger that attempts to thwart our detection technique.Albeit already robust against the large majority of evasivebehaviors, we also present and evaluate a heuristic againstelaborated evasion strategies.

    40 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

    . S. Ortolani and C. Giuffrida are with Vrije Universiteit, De Boelelaan1081A, Amsterdam 1081HV, The Netherlands.E-mail: {ortolani, giuffrida}@cs.vu.nl.

    . B. Crispo is with the University of Trento, Via Sommarive 14, Povo,Trento I-38123, Italy. E-mail: [email protected].

    Manuscript received 27 Feb. 2012; revised 12 June 2012; accepted 5 Sept.2012; published online 7 Sept. 2012.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSC-2012-02-0039.Digital Object Identifier no. 10.1109/TDSC.2012.76.

    1545-5971/13/$31.00 2013 IEEE Published by the IEEE Computer Society

  • 8/10/2019 06297976

    2/13

    The structure of the paper is as follows: we start with anin-depth analysis of modern keyloggers in Section 2. Wethen introduce our approach in Section 3, detail itsarchitecture in Section 4, and evaluate the resultingprototype in Section 5. Section 6 discusses the robustnessagainst evasion techniques. We conclude with related workin Section 7 and final remarks in Section 8.

    2 INTERNALS OF MODERN KEYLOGGERS

    Breaching the privacy of an individual by logging hiskeystrokes can be perpetrated at many different levels. Forexample, an attacker with physical access to the machinemight wiretap the hardware of the keyboard. A dishonestowner of an Internet cafe, in turn, may find it moreconvenient to purchase a software solution, install it on allthe terminals, and have the logs dropped on his ownmachine. Depending on the setting, a keylogger can beimplemented in many different ways. For instance,externalkeyloggers rely on some physical property, either the

    acoustic emanations produced by the user typing [6], orthe electromagnetic emanations of a wireless keyboard [7].Hardware keyloggers are still external devices, but areimplemented as dongles placed in between keyboard andmotherboard. All these strategies, however, require physi-cal access to the target machine.

    To overcome thislimitation, software approaches are morecommonly used.Hypervisor-based keyloggers (e.g., BluePill [8])are the straightforward software evolution of hardware-based keyloggers, literally performing a man-in-the-middleattack between the hardware and the operating system.Kernel keyloggers come second in the chain and are often

    implemented as part of more complex rootkits. In contrast tohypervisor-based approaches, hooks are directly used tointercept buffer-processing events or other kernel messages.

    Albeit effective, all these approaches require privilegedaccess to the machine. Moreover, writing a kernel driver-hypervisor-based approaches pose even more challenges-requires a considerable effort and knowledge for an effectiveand bug-free implementation (even a single bug may lead toa kernel panic).User-spacekeyloggers, on the other hand, donot require any special privilege to be deployed. They can beinstalled and executed regardless of the privileges granted.

    This is a feat impossible for kernel keyloggers, since theyrequire either superuser privileges or a vulnerability thatallows arbitrary kernel code execution. Furthermore, user-space keylogger writers can safely rely on well-documentedsets of APIs commonly available on modern operatingsystems, with no special programming skills required.

    User-space keyloggers can be further classified based onthe scope of the hooked message/data structures. Since asystem hosts multiple applications, keystrokes can beintercepted either globally (i.e., for all the applications) orlocally (i.e.,within theapplication). We term these twoclassesof user-space keyloggerstype Iand type II. Fig. 1 shows theproposed classification: the left pane shows the process of

    delivering a keystroke to the intended application, whereasthe right panehighlights the particular component subvertedby each type of keylogger. Both types can be easilyimplemented in Windows, while the facilities available inUnix-like OSesX11 and GTK requiredallow for astraightforward implementation of the more invasive type Ikeyloggers.

    Table 1 presents a list of all the APIs that can be used toimplement a user-space keylogger. In brief, the SetWin-dowsHookEx() and gdk_window_add_filter() APIsare used to interpose the keylogging procedure before a

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 41

    Fig. 1. The delivery phases of a keystroke, and the componentspotentially subverted (we omit hypervisor-based approaches for thesake of clarity).

    TABLE 1If the Scope of the API is Local, the Keylogger Must Inject Portions of Its Code in Each Application, e.g., Using a Library

  • 8/10/2019 06297976

    3/13

    keystroke is effectively delivered to the target process. ForSetWindowsHookEx() , this is possible by setting the lastparameter (thread_id) to 0 (which subscribes to anykeyboard event). For gdk_window_add_filter(), it issufficient to set the handler of the monitored window toNULL. The class of functions Get * State(), XQuer-

    yKeymap(), and inb(0x60) query the state of thekeyboard and return a vector with the state of all (one incase of GetKeyState()) the keystrokes. When usingthese functions, the keylogger must continuously pollthe keyboard in order to intercept all the keystrokes. Thefunctions of the last class apply only to Windows and aretypically used to overwrite the default address of key-stroke-related functions in all the Win32graphical applica-tions. We have not found any example of this particularclass of keyloggers in Unix-like OSes.

    Since some of the APIs have just local scope, Type II

    keyloggers need to injectpart of their code in a sharedportionof the address space to have all the processes execute the

    provided callback. The only exception is with a Type IIkeylogger that uses either GetKeyState() or GetKey-boardState(). In these cases, the keylogging process can

    attachits input queue (i.e., thequeue of eventsused to controla graphical user application) to other threads by using the

    procedure AttachtreadInput(). As a tentative counter-measure, Windows Vista recently eliminated the ability to

    share the same input queue for processes running in twodifferent integrity levels. Unfortunately, since higher integ-

    rity levels are assigned only to known processes (e.g.,Internet Explorer), common applications are still vulnerable

    to these interception strategies.

    We can draw three important conclusions from ouranalysis. First, all user-space keyloggers are implemented byeither hook-based or polling mechanisms. Second, all APIsare legitimate and well-documented. Third, all modernoperating systems offer (a flavor of) these APIs. In particular,they always provide the ability to intercept keystrokesregardless of the application on focus. This design choice isdictated by the necessity to support such functionalities forlegitimate applications. The following are three simplescenarios in which the ability to intercept arbitrary key-strokes is a functional requirement: 1) keyboards withadditional special-purpose keys; 2) window managers withsystem-defined shortcuts; 3) background user applicationswhose execution is triggered by user-defined shortcuts (forinstance, an application handling multiple virtual work-spaces requires hot keys that must not be overridden byother applications). All these functionalities can be imple-mented with all the APIs we presented so far (with theexception of inb(0x60), which is reserved to the superuser and tailored to low-level tasks). As shown earlier, theinterception facilities can be easily subverted, allowing thekeyloggers to benefit from all the features normally reservedto legitimate applications.

    . Ease of implementation. A minimal yet functional

    keylogger can be implemented in less than 100 linesofC# code. Due to the low complexity, it is also easyto enforce polymorphic or metamorphic behavior tothwart signature-based countermeasures.

    . Cross-version. By relying on documented and stableAPIs, a particular keylogger can be easily deployedon multiple versions of the same operating system.

    . Unprivileged installation. No privilege is requiredto install a keylogger. There is no need to look forrather specific exploits to execute arbitrary privi-leged code.

    . Unprivileged execution. The keylogger is hardlynoticeable at all during normal execution. The execu-table does not need to acquire privileged rights.

    3 OURAPPROACH

    Our approach is explicitly focused on designing a detectiontechnique for unprivileged user-space keyloggers. Unlikeother classes of keyloggers, a user-space keylogger is abackground process which registers operating-system-sup-ported hooks to surreptitiously eavesdrop (and log) everykeystroke issued by the user into the current foregroundapplication. Our goal is to prevent user-space keyloggers

    from stealing confidential data originally intended for a(trusted) legitimate foreground application. Malicious fore-ground applications surreptitiously logging user-issuedkeystrokes (e.g., a keylogger spoofing a trusted wordprocessor application) and application-specific keyloggers(e.g., browser plugins surreptitiously performing keylog-ging activities) are outside our threat model and cannot beidentified using our detection technique. Also note that abackground keylogger cannot spawn a foreground applica-tion and steal the current application focus on demandwithout the user immediately noticing.

    Our model is based on these observations and explores

    the possibility of isolating the keylogger in a controlledenvironment, where its behavior is directly exposed to thedetection system. Our technique involves controlling thekeystroke events that the keylogger receives in input, andconstantly monitoring the I/O activity generated by thekeylogger in output. To assert detection, we leverage theintuition that the relationship between the input and outputof the controlled environment can be modeled for mostkeyloggers with very good approximation. Regardless of thetransformations the keylogger performs, a characteristicpattern observed in the keystroke events in input shallsomehow be reproduced in the I/O activity in output. Whenthe input and the output are controlled, we can identifycommon I/O patterns and flag detection. Moreover, pre-selecting the input pattern can better avoid spuriousdetections and evasion attempts. To detect backgroundkeylogging behavior our technique comprises a preproces-sing step to forcefully move the focus to the background.This strategy is also necessary to avoid flagging foregroundapplications that legitimately react to user-issued keystrokes(e.g., word processors) as keyloggers.

    The key advantage of our approach is that it is centeredaround a black-box model that completely ignores thekeylogger internals. Also, I/O monitoring is a nonintrusiveprocedure and can be performed on multiple processes

    simultaneously. As a result, our technique can deal with alarge number of keyloggers transparently and enables afully unprivileged detection system able to vet all theprocesses running on a particular system in a single run.

    42 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

  • 8/10/2019 06297976

    4/13

    Our approach completely ignores the content of the inputand the output data, and focuses exclusively on their

    distribution. Limiting the approach to a quantitativeanalysis enables the ability to implement the detectiontechnique with only unprivileged mechanisms, as we willbetter illustrate later. The underlying model adopted,however, presents additional challenges. First, we mustcarefully deal with possible data transformations that mayintroduce quantitative differences between the input andthe output patterns. Second, the technique should berobust with respect to quantitative similarities identified inthe output patterns of other legitimate system processes. Inthe following, we discuss how our approach deals withthese challenges.

    4 ARCHITECTURE

    Our design is based on five different components asdepicted in Fig. 2: injector, monitor, pattern translator,detector, pattern generator. The operating system at thebottom deals with the details of I/O and event handling.TheOS Domaindoes not expose all the details to the upperlevels without using privileged API calls. As a result, theinjector and the monitor operate at another level ofabstraction, the Stream Domain. At this level, keystrokeevents and the bytes output by a process appear as a streamemitted at a particular rate.

    The task of the injector is to inject a keystroke stream tosimulate the behavior of a user typing at the keyboard.Similarly, the monitor records a stream of bytes to constantlycapture the output behavior of a particular process. Astream representation is only concerned with the distribu-tion of keystrokes or bytes emitted over a given window ofobservation, without entailing any additional qualitativeinformation. The injector receives the input stream from thepattern translator, which acts as bridge between the StreamDomain and the Pattern Domain. Similarly, the monitordelivers the output stream recorded to the pattern translatorfor further analysis. In thePattern Domain, the input streamand the output stream are both represented in a more

    abstract form, termed Abstract Keystroke Pattern (AKP). Apattern in the AKP form is a discretized and normalizedrepresentation of a stream. Adopting a compact and uni-form representation is advantageous for several reasons.

    First, this allows the pattern generator to exclusively focuson generating an input pattern that follows a desireddistribution of values. Details on how to inject a particulardistribution of keystrokes into the system are offloaded tothe pattern translator and the injector. Second, the sameinput pattern can be reused to produce and inject severalinput streams with different properties but following thesame underlying distribution. Finally, the ability to reason

    over abstract representations simplifies the role of thedetector that only receives an input pattern and an outputpattern and makes the final decision on whether detectionshould or should not be triggered.

    4.1 Injector

    The role of the injector is to inject the input stream into thesystem, simulating the behavior of a user at the keyboard.By design, the injector must satisfy several requirements.First, it should only rely on unprivileged API calls. Second,it should be capable of injecting keystrokes at variablerates to match the distribution of the input stream. Finally,

    the resulting series of keystroke events produced shouldbe no different than those generated by a real user. Inother words, no user-space keylogger should be somehowable to distinguish the two types of events. To address allthese issues, we leverage the same technique employed inautomated testing. On Windows-based operating systemsthis functionality is provided by the API call key-bd_event. In all Unix-like OSes supporting X11the samefunctionality is available via the API call XTestFake-KeyEvent.

    4.2 Monitor

    The monitor is responsible to record the output stream of all

    the running processes. As done for the injector, we allowonly unprivileged API calls. In addition, we favor strategiesto perform realtime monitoring with minimal overheadand the best level of resolution possible. Finally, we areinterested in application-level statistics of I/O activities, toavoid dealing with filesystem-level caching or other poten-tial nuisances. Fortunately, most modern operating systemsprovide unprivileged API calls to access performancecounters on a per-process basis. On all the versions ofWindows since Windows NT 4.0, this functionality isprovided by the Windows Management Instrumentation(WMI). In particular, the performance counters of each

    process are made available via the class Win32_Process,which supports an efficient query-based interface. Thecounter WriteTransferCount contains the total numberof bytes written by the process since its creation. Note thatmonitoring the network activity is also possible, although itrequires a more recent version of Windows, i.e., at leastVista. To construct the output stream of a given process,the monitor queries this piece of information at regular timeintervals, and records the number of bytes written since thelast query every time. The proposed technique is obviouslytailored to Windows-based operating systems. Nonetheless,we point out that similar strategies can be realized in other

    OSes; both Linux and OSX, in fact, support analogousperformance counters which can be accessed in an un-privileged manner; the reader may refer to the iotop utilityfor usage examples.

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 43

    Fig. 2. The different components of our architecture.

  • 8/10/2019 06297976

    5/13

    4.3 Pattern Translator

    The role of the pattern translator is to transform an AKP intoa stream and vice versa, given a set of configurationparameters. A pattern in the AKP form can be modeled asa sequence of samples originated from a stream sampledwith a uniform time interval. A samplePi of a pattern P isan abstract representation of the number of keystrokesemitted during the time intervali. Each sample is stored in anormalized form in the interval0; 1, where 0 and 1 reflectthe predefined minimum and maximum number of key-strokes in a given time interval. To transform an inputpattern into a keystroke stream, the pattern translatorconsiders the following configuration parameters: N, thenumber of samples in the pattern; T, the constanttime interval between any two successive samples; Kmin,the minimum number of keystrokes per sample allowed;andKmax, the maximum number of keystrokes per sampleallowed.

    When transforming an input pattern in the AKP forminto an input stream, the pattern translator generates, for

    each time interval i, a keystroke stream with an averagekeystroke rate Ri

    PiKmaxKminKminT . The iteration is re-

    peated N times to cover all the samples in the originalpattern. A similar strategy is adopted when transforming anoutput byte stream into a pattern in the AKP form. Thepattern translator reuses the same parameters employed inthe generation phase and similarly assigns Pi

    RiTKminKmaxKmin

    where Riis the average keystroke rate measured in the timeintervali. The translator assumes a correspondence betweenkeystrokes and bytes and treats them equally as base unitsof the input and output stream, respectively. This assump-tion does not always hold in practice and the detection

    algorithm has to consider any possible scale transformationbetween the input and the output pattern. We discuss thisand other potential transformations in Section 4.4.

    4.4 Detector

    The success of our detection algorithm lies in the ability toinfer a cause-effect relationship between the keystrokestream injected in the system and the I/O behavior of akeylogger process, or, more specifically, between therespective patterns in AKP form. While one must examineevery candidate process in the system, the detectionalgorithm operates on a single process at a time, identify-ing whether there is a strong similarity between the input

    pattern and the output pattern obtained from the analysisof the I/O behavior of the target process. Specifically,given a predefined input pattern and an output pattern ofa particular process, the goal of the detection algorithm isto determine whether there is a match in the patterns andthe target process can be identified as a keylogger withgood probability.

    The first step in the construction of a detection algorithmcomes down to the adoption of a suitable metric to measurethe similarity between two given patterns. In principle, theAKP representation allows for several possible measures ofdependence that compare two discrete sequences and

    quantify their relationship. In practice, we rely on a singlecorrelation measure motivated by the properties of the twopatterns. The proposed detection algorithm is based on thePearson product-moment correlation coefficient (PCC), one

    of the most widely used correlation measures [9]. Given twodiscrete sequences described by two patterns P andQ withNsamples, the PCC is defined as [9]:

    rcov P ; Q

    PQ

    PNi1 Pi

    P Qi Q

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNi1 Pi

    P 2

    q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNi1 Qi

    Q 2q ; 1

    where covP ; Q is the sample covariance, P and Q aresample standard deviations, and Pand Q are sample means.The PCC has been widely used as an index to measurebivariate association for different distributions in severalapplications including pattern recognition, dataanalysis,andsignal processing [10]. The values given by the PCC arealways symmetric and ranging between 1 and 1, with 0indicating no correlation and 1 or 1 indicating completedirect (or inverse) correlation. To measure the degree ofassociation between two given patterns we are here onlyinterested in positive values of correlation. Hereafter, we willalways refer to its absolute value. Our interest in the PCC liesin its appealing mathematical properties. In contrast to othercorrelation metrics, the PCC measures the strength of a linearrelationship between two series of samples, ignoring anynonlinear association. In our setting,a lineardependence wellapproximates the relationship between the input pattern andan output pattern produced by a keylogger. The intuition isthat a keylogger can only make local decisions on a per-keystroke basis with no knowledge about the globaldistribution. Thus, in principle, the resulting behavior willlinearly approximate the original input stream injected intothe system.

    In detail, the PCC is resilient to any change in locationand scale, namely no difference can be observed in the

    correlation coefficient if every samplePi of any of the twopatterns is transformed into a Pi b, where a and b arearbitrary constants. This is important for a number ofreasons. Ideally, the input pattern and an output patternwill be an exact copy of each other if every keystrokeinjected is replicated as it is in the output of a keyloggerprocess. In practice, different data transformations per-formed by the keylogger can alter the original structure inseveral ways. First, a keylogger may encode each keystrokein a sequence of one or more bytes. Consider, for example,a keylogger encoding each keystroke using 8-bit ASCIIcodes. The output pattern will be generated examining a

    stream of raw bytes produced by the keylogger as it storeskeystrokes one byte at a time. Now consider the exact samecase but with keystrokes stored using a different encoding,e.g., 2 bytes per keystroke. In the latter case, the pattern willhave the same shape as the former one, but its scale will betwice as much. As explained earlier, the transformation inscale will not affect the correlation coefficient and the PCCwill report the same value in both cases. Similar argumentsare valid for keyloggers using a variable-length representa-tion to store keystrokes or encrypting keystrokes with avariable number of bytes.

    An interesting application of the location invariance

    property is the ability to mitigate the effect of buffering.When the keylogger uses a fixed-size buffer whose size iscomparable to the number of keystrokes injected at eachtime interval, it is easy to show that the PCC is not

    44 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

  • 8/10/2019 06297976

    6/13

    significantly affected. Consider, for example, the case whenthe buffer size is smaller than the minimum number ofkeystrokes Kmin. Under this assumption, the buffer isflushed out at least once per time interval. The number ofkeystrokes left in the buffer at each time interval determinesthe number of keystrokes missing in the output pattern.Depending on the distribution of samples in the inputpattern, this number would be centered around a particularvaluez. The statistical meaning of the valuezis the averagenumber of keystrokes dropped per time interval. Thistransformation can be again approximated by a locationtransformation of the original pattern by a factor of z,which again does not affect the value of the PCC. The lastexample shows the importance of choosing an appropriateKminwhen the effect of fixed-size buffers must also be takeninto account. As evident from the examples discussed, thePCC is robust to several possible data transformations.

    A fundamental factor to consider is, however, the numberof samples collected. While we would like to shorten theduration of the detection algorithm, there is a clear tension

    between the length of the patterns and the reliability of theresulting value of the PCC. A very small number of samplescan lead to unstable results. A larger number of samples isbeneficial especially in case of disturbing factors. As reportedin [11], selecting a larger number of samples could, forexample, reduce the adverse effect of outliers or measure-ment errors. The detection algorithm we have implementedin our detector, relies entirely on the PCC to estimate thecorrelation between an input and an output pattern. Todetermine whether a given PCC value should trigger adetection, a thresholding mechanism is used. We discusshow to select a suitable threshold in Section 5. Our detectionalgorithm is conceived to infer a causal relationship betweentwo patterns by analyzing their correlation. Admittedly,experience shows that correlation cannot be used to implycausation in the general case, unless valid assumptions aremade on the context under investigation [12]. In other words,to avoid false positives, strong evidence shall be collectedto infer with good probability that a given process is akeylogger. The next section discusses in detail how to select arobust input pattern and minimize the probability of falsedetections.

    4.5 Pattern Generator

    Our pattern generator is designed to support several

    pattern generation algorithms. More specifically, the pat-tern generator can leverage any algorithm producing avalid pattern in AKP form. We now present a number ofpattern generation algorithms and discuss their properties.

    The first important issue to consider is the effect ofvariability in the input pattern. Experience shows thatcorrelations tend to be stronger when samples are dis-tributed over a wider range of values [11]. In other words,the more the variability in the given distributions, the morestable and accurate the resulting PCC computed. Thissuggests that a robust input pattern should contain samplesspanning the entire target interval 0; 1. The level ofvariability in the resulting input stream is also similarly

    influenced by the range of keystroke rates used in thepattern translation process. The higher the range delimitedby the minimum keystroke rate and maximum keystrokerate, the more reliable the results.

    The adverse effect of low variability in the input patterncan be best understood when analyzing the mathematicalproperties of the PCC. The correlation coefficient reportshigh values when the two patterns tend to grow apart fromtheir respective means. As a consequence, the more closelyto their respective means the patterns are distributed, theless accurate the resulting PCC. In the extreme case of novariability, the standard deviation is 0 and the PCC is not

    even defined. This suggests that a robust pattern generationalgorithm should never consider constant or low-variabilitypatterns. Moreover, when a constant pattern is generatedfrom the output stream, our detection algorithm assigns anarbitrary correlation score of 0. This is coherent under theassumption that the selected input pattern presents areasonable level of variability, and poor correlation shouldnaturally be expected against other low-variability patterns.

    A robust pattern generation algorithm should allow for aminimum number of false positive. When the chosen inputpattern happens to closely resemble the I/O behavior ofsome benign process, the PCC may report a high value ofcorrelation for that process and trigger a false detection. Forthis reason, it is important to focus on input patterns thathave little chances of being confused with output patternsgenerated by legitimate processes. Fortunately, studiesshow that the correlation between realistic I/O workloadsfor PC users is generally considerably low over small timeintervals. The results presented in [13] are derived from14 traces collected over a number of months in realisticenvironments used by different categories of users. Theauthors show that the value of correlation given by the PCCover 1 minute of I/O activity is only 0.046 on average andnever exceeds 0.070 for any two given traces. This suggeststhat the I/O behavior of one or more processes is in general

    very poorly correlated with other I/O distributions.The problem of designing a pattern generation algorithm

    that minimizes false positives under a given knownworkload can be modeled as follows: we assume that tracesfor the target workload can be collected and converted intoa series of patterns (one for each process running on thesystem) of the same lengthN. All the patterns are generatedto build a valid training set for the algorithm. Under theassumption that the traces collected are representative ofthe real workload available at detection time, our goal is todesign an algorithm that learns the characteristics of thetraining data and generates a maximally uncorrelated input

    pattern. Concretely, the goal of our algorithm is to producean input pattern of length N that minimizes the PCCmeasured against all the patterns in the training set.Without any further constraints on the samples of thetarget input pattern, it can be shown that this problem is anontrivial nonlinear optimization problem. In practice, wecan relax the original problem by leveraging some of theassumptions discussed earlier. As motivated before, arobust input pattern should present samples distributedover a wide range of values. To assume the widest rangepossible, we can arbitrarily constrain the series of samplesto be uniformly distributed over the target interval 0; 1.

    This is equivalent to consider a set ofNsamples of the form

    S 0; 1

    N 1;

    2

    N 1; . . . ;

    N 2

    N 1; 1

    : 2

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 45

  • 8/10/2019 06297976

    7/13

    When the N samples are constrained to assume all thevalues from the set S, the optimization problem comesdown to finding the particular permutation of values thatminimizes the PCC considering all the patterns in thetraining set. This problem is a variant of the standardassignment problem, where each particular pairwise assign-ment yields a known cost and the ultimate goal is tominimize the sum of all the costs involved [14].

    In our scenario, the objects are the samples in the targetsetS, and the tasks reflect theNslots available in the inputpattern. In addition, the cost of assigning a sample Si fromthe set Sto a particular slotj is

    ci; j X

    t

    Si S

    Ptj Pt

    SPt

    ; 3

    wherePt are the patterns in the training set, and SandSare the constant mean and standard distribution of thesamples inS, respectively. The cost value ci; jreflects thevalue of a single addendum in the expression of the overall

    PCC we want to minimize. Note that the cost value iscalculated against all the patterns in the training set. Theformulation of the cost value has been simplified assumingconstant number of samples N and constant number ofpatterns in the training set. Unfortunately, this problemcannot be addressed by leveraging well-known algorithmsthat solve the assignment problem in polynomial time [14].In contrast to the standard formulation, we are notinterested in the global minimum of the sum of the costvalues. Such an approach would attempt to find a patternwith a PCC maximally close to 1. In contrast, our goal is toproduce a maximally uncorrelated pattern, thereby aiming

    at a PCC as close to 0 as possible. This problem can bemodeled as an assignment problem with side constraints.Prior research has shown how to transform this parti-

    cular problem into an equivalent Quadratic AssignmentProblem (QAP) that can be very efficiently solved with astandard QAP solver when the global minimum is known inadvance [15]. In our solution, we have implemented asimilar approach limiting the approach to a maximumnumber of iterations to guarantee convergence since theminimum value of the PCC is not known in advance. Inpractice, for a reasonable number of samples N and amodest training set, we found that this is rarely a concern.The algorithm can usually identify the optimal pattern in abearable amount of time.

    To conclude, we now more formally propose two classesof pattern generation algorithms for our generator. First, weare interested in workload-aware generation algorithms.For this class, we focus on the optimization algorithm wehave just introducedwe refer to this pattern generationalgorithm with the term WLDassuming a number ofrepresentative traces have been made available for thetarget workload. Moreover, we are interested in workload-agnostic pattern generation algorithms. With no assump-tion made on the nature of the workload, they are moregeneric and easier to implement. In this class, we propose

    the following algorithms:

    . Random (RND). Each sample is generated atrandom.

    . Random with fixed range (RFR). The pattern is arandom permutation of a series of samples uniformlydistributed over the interval 0; 1. This is to max-imize the amount of variability in the input pattern.

    . Impulse (IMP). Every sample 2i is assigned thevalue of 0 and every sample 2i 1 is assigned thevalue of 1. This algorithm attempts to produce aninput pattern with maximum variance and idleperiods at minimum.

    . Sine wave (SIN). The pattern generated is a discretesine wave distribution oscillating between 0 and 1.The sine wave grows or drops with a fixed step of0.1. This algorithm explores the effect of constantincrements and decrements in the input pattern.

    5 EVALUATION

    To evaluate the proposed detection technique, we imple-mented a prototype based on the ideas described in thepaper. Written in C# in 7,000 LoC, it runs as an unprivileged

    application for the Windows OS. It also collects simulta-neously all the processes I/O patterns, thus allowing us toanalyze the whole system in a single run. Although theproposed design can easily be extended to other OSes, weexplicitly focus on Windows for the significant number ofkeyloggers available. In the following, we present severalexperiments to evaluate our approach. The ultimate goal isto understand the effectiveness of our technique and itsapplicability to realistic settings. For this purpose, weevaluated our prototype against many publicly availablekeyloggers. We also developed our own keylogger toevaluate the effect of particular conditions more thor-oughly. Finally, we collected traces for different realisticPC workloads to evaluate the effectiveness of our approachin real-life scenarios. We ran all of our experiments on PCswith a 2.53 Ghz Core 2 Duo processor, 4 GB memory, and7,200 rpm SATA II hard drives. Every test was performedunder Windows 7 Professional SP1, while the workloadtraces were gathered from a number of PCs running severaldifferent versions of Windows. Since the performancecounters are part of the default accounting infrastructure,monitoring the processes I/O came at negligible cost: forreasonable values ofT, i.e., > 100 ms, the load imposed onthe CPU by the monitoring phase was less than 2 percent.On the other hand, injecting high keystroke rates intro-

    duced additional processing overhead throughout thesystem. Experimental results showed that the overheadgrows approximately linearly with the number of key-strokes injected per sample. In particular, the CPU loadimposed by our prototype reaches 25 percent around 15,000keystrokes per sample and 75 percent around 47,000. Notethat these values only refer to detection-time overhead. Noruntime overhead is imposed by our technique when nodetection is in progress.

    5.1 Keylogger Detection

    To evaluate the ability to detect real-world keyloggers, we

    experimented with all the keyloggers from the topmonitoring free software list [5], an online repositorycontinuously updated with reviews and latest develop-ments in the area. To carry out the experiments, we

    46 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

  • 8/10/2019 06297976

    8/13

    manually installed each keylogger, launched our detectionsystem for N Tms, and recorded the results; we assertedsuccessful detection for PCC 0:7. In the experiments, wefound that arbitrary choices ofN, T, Kmin, and Kmax werepossible; the reason is that we observed the same results forseveral reasonable combinations of the parameters. We alsoselected the RFR algorithm as the pattern generation

    algorithm for the experiments. More details on how totune the parameters in the general case are given inSections 5.2 and 5.3.

    Table 2 shows the keyloggers used in the evaluation andsummarizes the detection results. All the keyloggers weredetected within a few seconds without generating any falsepositives; in particular, no legitimate process scored PCCvalues 0:3. Virtuoza Free Keylogger required a longerwindow of observation to be detected; this sample wasindeed the only keylogger to store keystrokes in memoryand flush out to disk at regular time intervals. Nevertheless,we were still able to collect consistent samples from flush

    events and report high PCC values.In a few other cases, keystrokes were kept in memory butflushed out to disk as soon as the keylogger detected achange of focus. This was the case for Actual Keylogger,Revealer Keylogger Free, and Refog Keylogger Free. To dealwith this common strategy, our detection system enforces achange of focus every time a sample is injected. In addition,some of the keyloggers examined included support forencryption and most of them used variable-length encodingto store special keys. As Section 5.2 demonstrates, ourapproach deal with these nuisances transparently with noeffect on the resulting PCC.

    Another potential issue arises from keyloggers dumpinga fixed-format header on the disk every time a change offocus is detected. The header typically contains the dateand the name of the target application. Nonetheless, as wedesigned our detection system to change focus at everysample, the header is flushed out to disk at each time intervalalong with all the keystrokes injected. As a result, the outputpattern monitored is simply a location transformation of theoriginal, with the shift given by size of the header itself.Thanks to the location invariance property, our detectionalgorithm is naturally resilient to this transformation.

    5.2 False Negatives

    In our approach, false negatives may occur when the outputpattern of a keylogger scores an unexpectedly low PCCvalue. To test the robustness of our approach against falsenegatives, we made several experiments with our own

    artificial keylogger. Our evaluation starts by analyzing theimpact of the number of samplesNand the time intervalTon the final PCC value. For each pattern generationalgorithm, we plot the PCC measured with our prototypekeylogger which we configured so that no buffering or datatransformation was taking place. Figs. 3a and 3b depict ourfindings with Kmin 1 andKmax 1;000. We observe thatwhen the keylogger logs each keystroke without introdu-cing delay or additional noise, the number of samples Ndoes not affect the PCC value. This behavior should not

    suggest that N has no effect on the production of falsenegatives. When noise in the output stream is to beexpected, higher values of N are indeed desirable toproduce more stable PCC values and avoid false negatives.

    In contrast, Fig. 3b shows that the PCC is sensitive to lowvalues of the time interval T. The effect observed is due tothe inability of the system to absorb all the injectedkeystrokes for time intervals shorter than 450 ms. Fig. 3c,in turn, shows the impact ofKmin on the PCC (with Kmaxstill constant). The results confirm our observations inSection 4.4, i.e., that patterns characterized by a lowvariance hinder the PCC, and thus a high variability in

    the injection pattern is desirable.We now analyze the impact of the maximum number ofkeystrokes per time interval Kmax. High Kmax values areexpected to increase the level of variability, reduce theamount of noise, and induce a more distinct distribution inthe output stream of the keylogger. The keystroke rate,however, is clearly bound by the length of the time intervalT. Fig. 4 depicts the PCC measured with our prototypekeylogger for N30, Kmin 1, and RND pattern genera-tion algorithm. The figure reports very high PCC values forKmax< 20,480and T1;000ms. This behavior reflects theinability of the system to absorb more than Kmax 20,480in

    the given time interval. IncreasingT is, however, sufficientto allow higherKmaxvalues without significantly impactingthe PCC. For example, with T 3,500 ms we can doubleKmax without sensibly degrading the final PCC value.

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 47

    TABLE 2Detection Results

    Fig. 3. Impact of N, T, andKmin

    on the PCC.

  • 8/10/2019 06297976

    9/13

    In a more advanced version of our keylogger, we alsosimulated the effect of several possible input-outputtransformations. First, we experimented with a keyloggerusing a nontrivial fixed-length encoding for keystrokes.Fig. 5a depicts the results for different values of padding p

    with N30, Kmin 1, and Kmax 1,024. A value of p1,024simulates a keylogger writing 1,024 bytes on the diskfor each eavesdropped keystroke. As discussed in Sec-tion 4.4, the PCC should be unaffected in this case andpresumably exhibit a constant behavior. The figureconfirms this intuition, but shows the PCC decreasinglinearly after p 10;000 bytes. This behavior is due to thelimited I/O throughput that can be achieved within asingle time interval. We previously encountered similarproblems when choosing suitable values for Kmax. Notethat in this scenario both Kmin and Kmax are affected bythe padding introduced, thus yielding a more significant

    impact on the PCC.Let us now consider the case of a keylogger logging anadditional random number of characters r2 0; rmax eachtime a keystroke is eavesdropped. This evaluates the impactof several conditions. First, the experiment simulates akeylogger randomly dropping keystrokes with a certainprobability. Second, it simulates a keylogger encoding anumber of keystrokes with special sequences, e.g., CTRLlogged as [Ctrl]. Finally, this is useful to investigate theimpact of a keylogger performing variable-length encryp-tion or other variable-length transformations such as datacompression. In the latter scenario, different keystrokescancodes may be encoded with strings of different length.

    This source of nonlinearity has potential to break thecorrelation and thus hinder detection. However, sincewe control the injection pattern, we can make each keystrokescancode equiprobable, thus forcing any content-dependent

    transformation to encode each keystroke with a data stringof comparable size. The result is that each of thesetransformations can be always approximated by a lineartransformation with constant scaling.

    To generate r we considered different probability dis-tributions: uniform, poisson, gaussian, and exponential. Foreach distribution we repeated the experiment increasing thevalue of a characteristic parameter (reported on thex-axis).Results are depicted in Fig. 5b. As observed in Fig. 5a, thePCC only drops at saturation, i.e., when the average numberof keystrokes written to the disk is around 10,000 bytes. Thefigure also shows that choosing either a uniform or gaussiandistribution results in more stable PCC values. Thesedistributions, unlike the poisson and exponential, do notpreclude low-valued samples, and are thus less likely tosaturate the system in a particular configuration. Again, asFig. 4 suggested, obtaining more stable values for the PCC isstill possible if we increase the time interval T. If the numberof samplesNis however kept constant, the user shall expecta proportionally longer detection time.

    We conclude our analysis by verifying the impact of akeylogger buffering the eavesdropped data before leaking itto the disk. Although we have not found many real-worldexamples of this behavior in our evaluation, our techniquecan still handle this class of keyloggers correctly forreasonable buffer sizes. Fig. 6 depicts our detection resultsagainst a keylogger buffering its output through a fixed-sizebuffer. The figure shows the impact of several possiblechoices of the buffer size on the final PCC value. We canobserve the pivotal role of Kmax in successfully assertingdetection. For example, increasing Kmax to 10,240 is

    necessary to achieve sufficiently high PCC values for thelargest buffer size proposed. This experiment demonstratesonce again that the key to detection is inducing the pattern todistinctly emerge in the output distribution, a feat that canbe easily obtained by choosing a highly variable injectionpattern with low values for Kmin and high values forKmax.We believe these results are encouraging to acknowledge therobustness of our detection technique against false nega-tives, even in presence of complex data transformations.

    5.3 False Positives

    In our approach, false positives may occur when the outputpattern of some benign process accidentally scores a

    significant PCC value. If the value happens to be greaterthan the selected threshold, a false detection is flagged. Thissection evaluates our prototype keylogger to investigate thelikelihood of this scenario in practice.

    48 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

    Fig. 4. Impact of Kmax andTon the PCC.

    Fig. 5. Impact of different classes of noise on the PCC.

    Fig. 6. Detection of a keylogger buffering its output.

  • 8/10/2019 06297976

    10/13

    To generate representative synthetic workloads for thePC user, we adopted the widely used SYSmark 2004 SEsuite [16]. The suite leverages common Windows interactiveapplications to generate realistic workloads that mimiccommon user scenarios with input and think time. In its2004 SE version, SYSmark supports two workload scenar-ios: Internet Content Creation (Internet workload from nowon), and Office Productivity (Office workload from nowon). In addition to the workload scenarios supported bySYSmark, we also experimented with another workloadsimulating an idle Windows system with common userapplications1 running in the background, and no inputallowed by the user.

    For each scenario, we repeatedly reproduced the syn-thetic workloads on a number of different machines andcollected I/O traces of all the running processes for severalpossible sampling intervalsT. Each trace was stored as a set

    of output patterns and broken down into k consecutivechunks ofNsamples. Every experiment was repeated overk=2 rounds, once for each pair of consecutive chunks. Ateach round, the output patterns from the first chunk wereused to train our workload-aware pattern generationalgorithm, while the second chunk was used for testing. Inthe testing phase, we measured the maximum PCC betweenevery generated input pattern of lengthNand every outputpattern in the testing set. At the end of each experiment, weaveraged all the results. We also tested all the workload-agnostic pattern generation algorithms introduced earlier, inwhich case we just relied on an instrumented version of our

    prototype to measure the maximum PCC in all the depictedscenarios for all the k chunks.We start with an analysis of the pattern length N,

    evaluating its effect with T 1;000 ms. Similar results canbe obtained with other values ofT. Fig. 7 (top row) depictsthe results of the experiments for the Idle, Internet, andOffice workload. The behavior observed is very similar inall the workload scenarios examined. The only noticeabledifference is that the Office workload presents a slightlymore unstable PCC distribution. This is probably due to themore irregular I/O workload monitored. As shown in thefigures, the maximum PCC value decreases exponentiallyasNincreases. This confirms the intuition that for small N,the PCC may yield unstable and inaccurate results, possibly

    assigning very high correlation values to regular systemprocesses. Fortunately, the maximum PCC decreases veryrapidly and, for example, for N >30, its value is constantlybelow 0.35. As far as the pattern generation algorithms are

    concerned, they all behave very similarly. Notably, RFRyields the most stable PCC distribution. This is especiallyevident for the Office workload. In addition, our workload-aware algorithm WLD does not perform significantly betterthan any other workload-agnostic pattern generation algo-rithm. This strongly suggests that, independently of thevalue ofN, the output pattern of a process at any given timeis not in general a good predictor of the output pattern thatwill be monitored next. This observation reflects the lowlevel of predictability in the I/O behavior of a process.

    From the same figures we can observe the effect of theparameter T on input patterns generated by the IMP

    algorithm (withN50). For small values ofT, IMP outper-forms all the other algorithms by producing extremelyanomalous I/O patterns in any workload scenario. As Tincreases, the irregularity becomes less evident and IMPmatches the behavior of the other algorithms more closely. Ingeneral, for reasonable values ofT, all the pattern generationalgorithms reveal a constant PCC distribution. This confirmsthe property of self-similarity of the I/O traffic [13]. Notably,RFR and WLD reveal a more steady distribution of the PCC.This is due to the use of a fixed range of values in both cases,and confirms the intuition that more variability in the inputpattern leads to more accurate results.

    For very small values ofT, we note that WLD performssignificantly better than the average. This is a hint thatpredicting the I/O behavior of a generic process in a fairlyaccurate way is only realistic for small windows of observa-tion. In all the other cases, we believe that the complexity ofimplementing a workload-aware algorithm largely out-weighs its benefits. In our analysis, we found that similarPCC distributions can be obtained with very different typesof workload, suggesting that it is possible to select the samethreshold for many different settings. For reasonable valuesof N and T, we found that a threshold of0:5 is usuallysufficient to rule out the possibility of false positives, while

    being able to detect most keyloggers effectively. In addition,the use of a stable pattern generation algorithm like RFRcould also help minimize the level of unpredictability acrossmany different settings.

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 49

    Fig. 7. Impact of NandTon the PCC measured with our prototype keylogger against different workloads.

    1. Skype 4, Pidgin 2.6.3, Dropbox 0.6, Firefox 3.5.7, Google Chrome 5,Avira Antivir Personal 9, Comodo Firewall 3.13, and VideoLAN 1.0.5.

  • 8/10/2019 06297976

    11/13

    6 EVASION AND COUNTERMEASURES

    In this section, we speculate on the possible evasiontechniques a keylogger may employ once our detectionstrategy is deployed on real systems.

    6.1 Aggressive Buffering

    A keylogger may rely on some forms of aggressive

    buffering, for example flushing a very large buffer everytime interval t, with t being possibly hours. While ourmodel can potentially address this scenario, the extremelylarge window of observation required to collect a sufficientnumber of samples would make the resulting detectiontechnique impractical. It is important to point out that sucha limitation stems from the implementation of the techni-que and not from a design flaw in our detection model. Forexample, our model could be applied to memory accesspatterns instead of I/O patterns to make the resultingdetection technique immune to aggressive buffering. Thisstrategy, however, would require a heavyweight infra-structure (e.g., virtualized environment) to monitor thememory accesses, thus hindering the benefits of a fullyunprivileged solution.

    6.2 Trigger-Based Behavior

    A keylogger may trigger the keylogging activity only in faceof particular events, for example when the user launches aparticular application. Unfortunately, this trigger-basedbehavior may successfully evade our detection technique.This is not, however, a shortcoming specific to ourapproach, but rather a more fundamental limitationcommon to all the existing detection techniques based ondynamic analysis [17]. While we believe that the problem oftriggering a specific behavior is orthogonal to our work andalready focus of much ongoing research, we point out thatthe user can still mitigate this threat by periodicallyreissuing detection runs when necessary (e.g., every timea new particularly sensitive context is accessed). Since ourtechnique can vet all the processes in a single detection run,we believe this strategy can be realistically used in real-world scenarios.

    6.3 Discrimination Attacks

    Mimicking the users behavior may expose our approach tokeyloggers able to tell artificial and real keystrokes apart. A

    keylogger may, for instance, ignore any input failing todisplay known statistical properties e.g., not akin to theEnglish language. However, since we control the inputpattern, we can carefully generate keystroke scancodesequences displaying the same statistical properties(e.g., English text) expected by the keylogger, and therewithperform a separate detection run thwarting this evasiontechnique. About the case of a keylogger ignoring key-strokes when detecting a high (nonhuman) injection rate.This strategy, however, would make the keylogger prone todenial of service: a system persistently generating andexfiltrating bogus keystrokes would induce this type of

    keylogger to permanently disable the keylogging activity.Recent work demonstrates that building such a system isfeasible in practice (with reasonable overhead) usingstandard two facilities [18].

    6.4 Decorrelation Attacks

    Decorrelation attacks attempt at breaking the correlationmetric our approach relies on. Since of all the attacks this isspecifically tailored to thwarting our technique, we herebypropose a heuristic intended to vet the system in case ofnegative detection results. This is the case, for instance, of akeylogger trying to generate I/O noise in the background

    and lowering the correlation that is bound to exist betweenthe pattern of keystrokes injected I and its own outputpattern O. In the attackers ideal case, this translates toP CCI; O 0. To approximate this result in the generalcase, however, the attacker must adapt its disguisementstrategy to the pattern generation algorithm in use,i.e., when switching to a new injection I0 6I, the outputpattern should reflect a new distribution O0 6O. Theattacker could, for example, enforce this property byadapting the noise generation to some input distribution-specific variable (e.g., the current keystroke rate). Failure todo so will result in random noise uncorrelated with theinjection, a scenario which is already handled by our PCC-based detection technique. At the same time, we expect anylegitimate process to maintain a sufficiently stable I/Obehavior regardless of the particular injection chosen.

    Leveraging this intuition, we now introduce a two-stepheuristic which flags a process as legitimate only when achange in the input pattern generation algorithm does nottranslate to a change in the I/O behavior of the process.Detection is flagged otherwise. In the first step, we adopt anonrandom pattern generation algorithm (e.g., SIN) tomonitor all the running processes for N T seconds. Thisallows us to collect a number of characteristic outputpatternsOi. In the second step, we adopt the RND pattern

    generation algorithm and monitor the system again forN Tseconds. Each output pattern O0i obtained is tested forsimilarity against the corresponding pattern Oimonitored inthe first step. At the end of this phase, a process i is flaggedas detected only when the similarity computed fails toexceed a certain threshold. To compare the output patternswe adopt the Dynamic Time Warping (DTW) algorithm as adistance metric [19]. This technique, often used to comparetime series, warps sequences in the time dimension todetermine a measure of similarity independent of nonlinearvariations in the time dimensions.

    To evaluate our heuristic, we implemented two differentkeyloggers attempting to evade our detection technique.

    The first one, K-EXP, uses a parallel thread to write arandom amount of bytes which increases exponentiallywith the number of keystrokes already logged to the disk.Since the transformation is nonlinear, we expect heavilyperturbed PCC values. The second one, K-PERF, uses aparallel thread to simulate a single fixed-rate byte streambeing written to the disk. In this scenario, the amount ofrandom bytes written to the disk is dynamically adjustedbasing on the keystroke rate eavesdropped. This is arguablyone of the most effective countermeasures a keylogger mayattempt to employ.

    Fig. 8 depicts the DTW computed by our two-stepheuristic for different processes and increasing values ofN.

    We can observe that our artificial keyloggers both scorevery high DTW values with the pattern generation algo-rithms adopted in the two steps (i.e., SIN and RND). Thereason whyK-PERFis also easily detected is that even small

    50 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013

  • 8/10/2019 06297976

    12/13

    variations produced by continuously adjusting the outputpattern introduce some amount of variability which iscorrelated with the input pattern. This behavior immedi-ately translates to non-negligible DTW values. Note that theattacker may attempt to decrease the amount variability by

    using a periodically flushed buffer to shape the observedoutput distribution. A possible way to address this type ofattack is to apply our detection model to memory accesspatterns, a strategy that we are investigating as part of ourongoing work [20]. The intuition is that memory accesspatterns can be used to infer the keylogging behaviordirectly from the memory activity, making the resultingdetection technique independent of the particular flushingstrategy adopted by the keylogger. In the figure, we can alsoobserve that all the legitimate processes analyzed scorevery low DTW values. This result confirms that their I/Obehavior is completely uncorrelated with the input patternchosen for injection. We observed similar results for other

    settings and applications; we omit results for brevity.Finally, Fig. 8 shows also that our artificial keyloggers bothscore increasingly higher DTW values for larger number ofsamples N. We previously observed similar behavior forthe PCC, for which more stable results could be obtained forincreasing values ofN. The conclusion is that analyzing asufficiently large number of samples is crucial to obtainaccurate results when estimating the similarity betweendifferent distributions.

    7 RELATEDWORK

    While ours is the first technique to solely rely on unprivi-leged mechanisms, several approaches have been recentlyproposed to detect privacy-breaching malware, includingkeyloggers. Behavior-based spyware detection has been firstintroduced by Kirda et al. in [21]. Their approach is tailoredto malicious Internet Explorer loadable modules. In parti-cular, modules monitoring the users activity and disclosingprivate data to third parties are flagged as malware. Theiranalysis models malicious behavior in terms of API callsinvoked in response to browser events. Those used bykeyloggers, however, are also commonly used by legitimateprograms. Their approach is therefore prone to falsepositives, which can only be mitigated with continuously

    updated whitelists.Other keylogger-specific approaches have suggested

    detecting the use of well-known keystroke interceptionAPIs. Aslam et al. [22] propose binary static analysis to

    locate the intended API calls. Unfortunately, all these callsare also used by legitimate applications (e.g., shortcutmanagers) and this approach is again prone to falsepositives. Xu et al. [23] push this technique further,specifically targeting Windows-based operating systems.They rely on the very same hooks used by keyloggers toalter the message type from WM_KEYDOWN to WM_CHAR. Akeylogger aware of this countermeasure, however, can

    easily evade detection by also switching to a new messagetype or periodically registering a new hook to obtain thehighest priority in the hook chain.

    Closer to our approach is the solution proposed by Al-Hammadi et al. [24]. Their strategy is to model thekeylogging behavior in terms of the number of API callsissued in the window of observation. To be more precise,they observe the frequency of API calls invoked to1) intercept keystrokes, 2) writing to a file, and 3) sendingbytes over the network. A keylogger is detected when two ofthese frequencies are found to be highly correlated. Since nobogus events are issued to the system (no injection of crafted

    input), the correlation may not be as strong as expected. Theresulting value would be even more impaired in case of anydelay introduced by the keylogger. Moreover, since theiranalysis is solely focused on a specific bot, it lacks a properdiscussion on both false positives and false negatives. Incontrast to their approach, our quantitative analysis isperformed at the byte granularity and our correlation metric(PCC) is rigorously linear. As shown earlier, linearity makesour technique completely resilient to several common datatransformations performed by keyloggers.

    A similar quantitative and privileged technique issketched by Han et al. [25]. Unlike the solution presentedin [24], their technique does include an injection phase.Their detection strategy, however, still models the key-logging behavior in terms of API calls. In practice, theassumption that a certain number of keystrokes results in apredictable number of API calls is fragile and heavilyimplementation-dependent. In contrast, our byte-levelanalysis relies on finer grained measurements and canidentify all the information required for the detection in afully unprivileged way. Complementary to our work, recentapproaches have proposed automatic identification oftrigger-based behavior, which can potentially thwart anydetection technique based on dynamic analysis. In parti-cular, in [17], [26] the authors propose a combination of

    concrete and symbolic execution for the task. Their strategyaims to explore all the possible execution paths that amalware can possibly exhibit during execution. As theauthors in [17] admit, however, automating the detection oftrigger-based behavior is an extremely challenging taskwhich requires advanced privileged tools. The problem isalso undecidable in the general case.

    8 CONCLUSIONS

    In this paper, we presented an unprivileged black-boxapproach for accurate detection of the most commonkeyloggers, i.e., user-space keyloggers. We modeled the

    behavior of a keylogger by surgically correlating the input(i.e., the keystrokes) with the output (i.e., the I/O patternsproduced by the keylogger). In addition, we augmented ourmodel with the ability to artificially inject carefully crafted

    ORTOLANI ET AL.: UNPRIVILEGED BLACK-BOX DETECTION OF USER-SPACE KEYLOGGERS 51

    Fig. 8. Impact of Non the DTW.

  • 8/10/2019 06297976

    13/13

    keystroke patterns. We then discussed the problem of

    choosing the best input pattern to improve our detection

    rate. Subsequently, we presented an implementation of ourdetection technique on Windows, arguably the most

    vulnerable OS to the threat of keyloggers. To establish an

    OS-independent architecture, we also gave implementation

    details for other operating systems. We successfully eval-

    uated our prototype system against the most common freekeyloggers [5], with no false positives and no false negatives

    reported. Other experimental results with a homegrown

    keylogger demonstrated the effectiveness of our technique

    in the general case. While attacks to our detection techniqueare possible and have been discussed at length in Section 6,

    we believe our approach considerably raises the bar for

    protecting the user against the threat of keyloggers.

    REFERENCES[1] T. Holz, M. Engelberth, and F. Freiling, Learning More About

    the Underground Economy: A Case-Study of Keyloggers and

    Dropzones, Proc. 14th European Symp. Research in ComputerSecurity, pp. 1-18, 2009.

    [2] San Jose Mercury News, Kinkois Spyware Case Highlights Riskof Public Internet Terminals, http://www.siliconvalley.com/mld/siliconvalley/news/6359407.htm, 2012.

    [3] N. Strahija, Student Charged After College Computers Hacked,http://www.xatrix.org/article2641.html, 2012.

    [4] N. Grebennikov, Keyloggers: How They Work and How toDetect Them, http://www.viruslist.com/en/analysis?pubid=204791931, 2012.

    [5] Security Technology Ltd., Testing and Reviews of Keyloggers,Monitoring Products and Spyware, http://www.keylogger.org, 2012.

    [6] L. Zhuang, F. Zhou, and J.D. Tygar, Keyboard AcousticEmanations Revisited, ACM Trans. Information and SystemSecurity,vol. 13, no. 1, pp. 1-26, 2009.

    [7] M. Vuagnoux and S. Pasini, Compromising ElectromagneticEmanations of Wired and Wireless Keyboards, Proc. 18thUSENIX Security Symp., pp. 1-16, 2009.

    [8] J. Rutkowska, Subverting Vista Kernel for Fun and Profit,BlackHat Briefings,vol. 5, 2007.

    [9] J.L. Rodgers and W.A. Nicewander, Thirteen Ways to Look at theCorrelation Coefficient,The Am. Statistician,vol. 42, no. 1, pp. 59-66, Feb. 1988.

    [10] J. Benesty, J. Chen, and Y. Huang, On the Importance of thePearson Correlation Coefficient in Noise Reduction, IEEE Trans.

    Audio, Speech, and Language Processing,vol. 16, no. 4, pp. 757-765,May 2008.

    [11] L. Goodwin and N. Leech, Understanding Correlation: Factorsthat Affect the Size of r, Experimental Education, vol. 74, no. 3,pp. 249-266, 2006.

    [12] J. Aldrich, Correlations Genuine and Spurious in Pearson andYule, Statistical Science, vol. 10, no. 4, pp. 364-376, 1995.[13] W. Hsu and A. Smith, Characteristics of I/O Traffic in Personal

    Computer and Server Workloads, IBM System J., vol. 42, no. 2,pp. 347-372, 2003.

    [14] H.W. Kuhn, The Hungarian Method for the AssignmentProblem, Naval Research Logistics Quarterly, vol. 2, pp. 83-97,1955.

    [15] G. Kochenberger, F. Glover, and B. Alidaee, An EffectiveApproach for Solving the Binary Assignment Problem with SideConstraints, Information Technology and Decision Making, vol. 1,pp. 121-129, May 2002.

    [16] BAPCO, SYSmark 2004 SE, http://www.bapco.com, 2012.[17] A. Moser, C. Kruegel, and E. Kirda, Exploring Multiple

    Execution Paths for Malware Analysis, Proc. IEEE 28th Symp.Security and Privacy, pp. 231-245, May 2007.

    [18] S. Ortolani and B. Crispo, Noisykey: Tolerating Keyloggers viaKeystrokes Hiding,Proc. Seventh USENIX Workshop Hot Topics inSecurity,2012.

    [19] H. Sakoe and S. Chiba, Readings in Speech Recognition, A. Waibeland K.-F. Lee, eds. Morgan Kaufmann Publishers, Inc., 1990.

    [20] S. Ortolani, C. Giuffrida, and B. Crispo, Klimax: ProfilingMemory Write Patterns to Detect Keystroke-Harvesting Mal-ware, Proc. 14th Intl Symp. Recent Advances in IntrusionDetection,pp. 81-100, 2011.

    [21] E. Kirda, C. Kruegel, G. Banks, G. Vigna, and R. Kemmerer,Behavior-Based Spyware Detection,Proc. 15th USENIX SecuritySymp.,pp. 273-288, 2006.

    [22] M. Aslam, R. Idrees, M. Baig, and M. Arshad, Anti-Hook Shieldagainst the Software Key Loggers, Proc. Natl Conf. EmergingTechnologies,pp. 189-191, 2004.

    [23] M. Xu, B. Salami, and C. Obimbo, How to Protect PersonalInformation Against Keyloggers, Proc. Ninth Intl Conf. Internetand Multimedia Systems and Applications, 2005.

    [24] Y. Al-Hammadi and U. Aickelin, Detecting Bots Based onKeylogging Activities, Proc. Third Intl Conf. Availability, Relia-bility and Security, pp. 896-902, 2008.

    [25] J. Han, J. Kwon, and H. Lee, Honeyid: Unveiling HiddenSpywares by Generating Bogus Events, Proc. IFIP 23rd IntlInformation Security Conf., pp. 669-673, 2008.

    [26] D. Brumley, C. Hartwig,Z. Liang, J. Newsome, D. Song, andH. Yin,Automatically Identifying Trigger-Based Behavior in Malware,

    Advances in Information Security,vol. 36, pp. 65-88, 2008.

    Stefano Ortolani received the MSc degree incomputer science from the Ca Foscari Univer-sity, Venice, Italy. He is working toward the PhDdegree in the computer systems section of theDepartment of Computer Science at the VrijeUniversiteit, Amsterdam. His research coverssecurity in computers and networks, privacy-enhancing technologies, and intrusion detection.

    Cristiano Giuffrida received the MEng degreein computer engineering from the University ofRome Tor Vergata, Italy. He is working towardthe PhD degree in the computer systems sectionof the Department of Computer Science at theVrije Universiteit, Amsterdam. His researchfocuses on the design and implementation of

    secure and reliable systems.

    Bruno Crispo received the PhD degree incomputer science from the University ofCambridge, United Kingdom. He is an associ-ate professor in the Department of ComputerScience at Vrije Universiteit and at theUniversity of Trento, Italy. His research inter-ests include networks, distributed systems,cryptography, and security protocols. He is asenior member of the IEEE.

    . For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

    52 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTI NG, VOL. 10, NO. 1, JANUARY/FEBRU ARY 2013