cryptology eprint archive 1 tap-wave-rub: lightweight

13
CRYPTOLOGY EPRINT ARCHIVE 1 Tap-Wave-Rub: Lightweight Malware Prevention for Smartphones using Intuitive Human Gestures Haoyu Li, Di Ma, Member, IEEE, Nitesh Saxena, Member, IEEE , Babins Shrestha, and Yan Zhu Abstract—Malware is a burgeoning threat for smartphones. It can surreptitiously access sensitive services on a phone without the user’s consent, thus compromising the security and privacy of the user. The problem is exacerbated especially in the context of emerging payment applications, such as NFC services. Traditional defenses to malware, however, are not suitable for smartphones due to their resource intensive nature. This necessitates the design of novel mechanisms that can take into account the specifics of the smartphone malware and smartphones themselves. In this paper, we introduce a lightweight permission enforcement approach – Tap-Wave-Rub (TWR) – for smartphone malware prevention. TWR is based on simple human gestures that are very quick and intuitive but less likely to be exhibited in users’ daily activities. Presence or absence of such gestures, prior to accessing an application, can effectively inform the OS whether the access request is benign or malicious. Specifically, we present the design of two mechanisms: (1) accelerometer-based phone tapping detection; and (2) proximity sensor based finger tapping, rubbing or hand waving detection. The first mechanism is geared for NFC applications, which usually require the user to tap her phone with another device. The second mechanism involves very simple gestures, i.e., tapping or rubbing a finger near the top of phone’s screen or waving a hand close to the phone, and broadly appeals to many applications (e.g., SMS). In addition, we present the TWR-enhanced Android permission model, the prototypes implementing the underlying gesture recognition mechanisms, and a variety of novel experiments to evaluate these mechanisms. Our results suggest the proposed approach could be very effective for malware detection / prevention, with quite low false positives and false negatives, while imposing little to no additional burden on the users. Index Terms—malware; mobile devices; NFC; context recognition; sensors 1 I NTRODUCTION Smartphones are undoubtedly becoming ubiquitous. They are not only used as (traditional) mobile phones for phone calling and SMS messaging, but also for many of the same purposes as desktop computers, such as web browsing, social networking, online shopping and banking. Also, smartphones are incorporating more and more sensors and communication interfaces. Such new capabilities enable smartphones with many new unique functionalities that desktop computers lack. For example, many smartphones are beginning to incorporate Near Field Communication (NFC) chips [28], which allows short, paired transactions with other NFC-enabled devices in close proximity. The use of NFC-equipped smartphones as payment tokens (such as Google Wallet) is considered to be the next generation payment system and the latest buzz in the financial industry [10]. Tap, Wave and Rub is a magical trick [2]. In this paper, these “magical” gestures are shown to provide an effective defense to the prevailing problem of mobile phone malware. Haoyu Li, Di Ma, and Yan Zhu are with the College of Engineering and Computer Science, University of Michigan-Dearborn, Dearborn, MI 48128. E-mail: {haoyul,dmadma,yanzhu}@umd.umich.edu Nitesh Saxena and Babins Shrestha are with University of Alabama at Birmingham, Birmingham, AL 35294. E-mail: [email protected] The work of Haoyu Li, Di Ma, and Yan Zhu was partially supported by the grant from US National Science Foundation (NSF CNS-1153573). Due to their popularity, smartphones are becoming a burgeoning target for malicious activities. There has been a rapid increase in mobile phone malware targeting different smartphone platforms [19], [21], [20], [35], [9], [33], [27], [31]. After infecting a phone, malware can gain access to sensitive resources / services on the phone for various purposes, such as stealing user data, making premium phone calls or SMS, damaging the device, tracking user’s activities or location, or simply annoying the user. Newer functionalities of smartphones only make them more attractive to malware writers. For example, the incor- poration of NFC chips on smartphones provides malware authors another (possibly much easier) way to deploy their attacks through the NFC interface [32]. Especially, due to the ease with which financial transactions can take place using NFC, it is predicted that NFC will become a popular target for malware aiming at credential and credit card theft [22]. Indeed, a proof-of-concept Trojan Horse electronic pickpocket program under the cover of a tic-tac-toe game has already been developed by Identity Stronghold [5]. In this attack, the game containing the malware is downloaded and installed on a NFC-enabled smartphone. Once activated (when the game is played), the malware accesses the NFC chip and enables the RFID (Radio Frequency ID) reading functionality. This reader then surreptitiously scans tags (e.g. RF tagged credit card) around it and reports the acquired information to the malware owner through e-mail once a victim tag is found in proximity. arXiv:1302.4010v1 [cs.CR] 16 Feb 2013

Upload: others

Post on 26-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

CRYPTOLOGY EPRINT ARCHIVE 1

Tap-Wave-Rub: Lightweight MalwarePrevention for Smartphones using Intuitive

Human GesturesHaoyu Li, Di Ma, Member, IEEE, Nitesh Saxena, Member, IEEE , Babins Shrestha, and Yan Zhu

Abstract—Malware is a burgeoning threat for smartphones. It can surreptitiously access sensitive services on a phone withoutthe user’s consent, thus compromising the security and privacy of the user. The problem is exacerbated especially in thecontext of emerging payment applications, such as NFC services. Traditional defenses to malware, however, are not suitablefor smartphones due to their resource intensive nature. This necessitates the design of novel mechanisms that can take intoaccount the specifics of the smartphone malware and smartphones themselves.In this paper, we introduce a lightweight permission enforcement approach – Tap-Wave-Rub (TWR) – for smartphone malwareprevention. TWR is based on simple human gestures that are very quick and intuitive but less likely to be exhibited in users’daily activities. Presence or absence of such gestures, prior to accessing an application, can effectively inform the OS whetherthe access request is benign or malicious. Specifically, we present the design of two mechanisms: (1) accelerometer-basedphone tapping detection; and (2) proximity sensor based finger tapping, rubbing or hand waving detection. The first mechanismis geared for NFC applications, which usually require the user to tap her phone with another device. The second mechanisminvolves very simple gestures, i.e., tapping or rubbing a finger near the top of phone’s screen or waving a hand close to the phone,and broadly appeals to many applications (e.g., SMS). In addition, we present the TWR-enhanced Android permission model,the prototypes implementing the underlying gesture recognition mechanisms, and a variety of novel experiments to evaluatethese mechanisms. Our results suggest the proposed approach could be very effective for malware detection / prevention, withquite low false positives and false negatives, while imposing little to no additional burden on the users.

Index Terms—malware; mobile devices; NFC; context recognition; sensors

F

1 INTRODUCTION

Smartphones are undoubtedly becoming ubiquitous. Theyare not only used as (traditional) mobile phones for phonecalling and SMS messaging, but also for many of the samepurposes as desktop computers, such as web browsing,social networking, online shopping and banking. Also,smartphones are incorporating more and more sensors andcommunication interfaces. Such new capabilities enablesmartphones with many new unique functionalities thatdesktop computers lack. For example, many smartphonesare beginning to incorporate Near Field Communication(NFC) chips [28], which allows short, paired transactionswith other NFC-enabled devices in close proximity. Theuse of NFC-equipped smartphones as payment tokens (suchas Google Wallet) is considered to be the next generationpayment system and the latest buzz in the financial industry[10].

Tap, Wave and Rub is a magical trick [2]. In this paper, these “magical”gestures are shown to provide an effective defense to the prevailingproblem of mobile phone malware.

• Haoyu Li, Di Ma, and Yan Zhu are with the College of Engineeringand Computer Science, University of Michigan-Dearborn, Dearborn,MI 48128.E-mail: {haoyul,dmadma,yanzhu}@umd.umich.edu

• Nitesh Saxena and Babins Shrestha are with University of Alabama atBirmingham, Birmingham, AL 35294.E-mail: [email protected]

The work of Haoyu Li, Di Ma, and Yan Zhu was partially supported bythe grant from US National Science Foundation (NSF CNS-1153573).

Due to their popularity, smartphones are becoming aburgeoning target for malicious activities. There has been arapid increase in mobile phone malware targeting differentsmartphone platforms [19], [21], [20], [35], [9], [33], [27],[31]. After infecting a phone, malware can gain accessto sensitive resources / services on the phone for variouspurposes, such as stealing user data, making premiumphone calls or SMS, damaging the device, tracking user’sactivities or location, or simply annoying the user.

Newer functionalities of smartphones only make themmore attractive to malware writers. For example, the incor-poration of NFC chips on smartphones provides malwareauthors another (possibly much easier) way to deploy theirattacks through the NFC interface [32]. Especially, due tothe ease with which financial transactions can take placeusing NFC, it is predicted that NFC will become a populartarget for malware aiming at credential and credit card theft[22]. Indeed, a proof-of-concept Trojan Horse electronicpickpocket program under the cover of a tic-tac-toe gamehas already been developed by Identity Stronghold [5]. Inthis attack, the game containing the malware is downloadedand installed on a NFC-enabled smartphone. Once activated(when the game is played), the malware accesses the NFCchip and enables the RFID (Radio Frequency ID) readingfunctionality. This reader then surreptitiously scans tags(e.g. RF tagged credit card) around it and reports theacquired information to the malware owner through e-mailonce a victim tag is found in proximity.

arX

iv:1

302.

4010

v1 [

cs.C

R]

16

Feb

2013

CRYPTOLOGY EPRINT ARCHIVE 2

While the security community has been battling with PCmalware for many years, malware detection on smartphonesturns out to be an even more challenging problem [8]. Thisis partially due to the resource constraints of smartphones(especially limited battery power). Thus, existing malwaredefenses for desktop computers cannot be applied directlyon the smartphone platform. Much of the existing researchfocuses on optimizing desktop based defenses for mobilephones [39], [36], [38], [13], [8]. These approaches eithertry to speed-up the detection process using advanced datastructures and algorithms, or use the network or remoteserver to reduce computation overhead on individual de-vices.

In practice, to protect mobile phones from malware at-tacks, major mobile phone manufacturers, such as Google,Apple, and Nokia, employ permission models to preventmalware from being installed at the first place. However,this approach relies upon user diligence and awareness,while most computer users lack these traits in practice. In-stead of relying on user permissions, smartphone manufac-turers also rely upon application review before releasing topeople for download. However, application review processcan be cumbersome and prone to human error [8].

1.1 Motivation and Rationale

We argue that existing malware defenses, without consider-ing the special characteristics of smartphone malware andthat of smartphones themselves, might not be sufficientto detect sophisticated malware, such as the pickpocketmalware targeting NFC mentioned previously.1 First, thepickpocket malware [5], under the cover of tic-tac-toe,is quite stealthy. Its surreptitious scanning may not causesubstantial changes (such as sharp increase in the number ofemails sent or in power consumption) to the normal behav-ioral profile and therefore behavioral detection schemes willnot be effective. Moreover, most existing malware detectionschemes employ a posteriori approach. That is, maliciousattacks are detected after they take place as traces needto be collected and trained before they can be comparedwith profiles to find abnormalities. Because of the sensitive(financial) nature of the NFC service, it is quite riskyto adopt such a posteriori detection approach. Instead, itis important to develop a preventive approach which canconstantly monitor, identify, and then stop such potentiallymalicious activity before it is launched so as to minimizedamage or loss.

This motivates us to design a novel approach for malwareprevention through contextual awareness. Our rationale isas follows. Smartphones are personal devices. That is, theend user is a human being. Thus, (legitimate) access tosensitive/valuable services such as premium calls, SMS orNFC usually involves different types of human activitiessuch as dialing a phone number, typing a message, or

1. Throughout the paper, we will center our malware mitigation designbased on properties observed from the pickpocket malware [5]; however,our approaches, being fundamental in nature, will be applicable to a broadrange of future malware.

clicking an application icon on the screen (to execute theapplication). In contrast, one common pattern followedby malware found on mobile phones is that it attemptsto access sensitive services without the user’s awarenessand authorization (thus user activity is very unlikely to beinvolved). Therefore, one way to detect such unauthorized,thus potentially malicious behavior, is to validate whetheran action is initiated by pure software or purposefully bya human user.

Since legitimate access to sensitive services usuallyinvolves different types of hand movements, we explorethe use of gestures to differentiate between pure softwareand human-initiated activities. In particular, in this paper,we propose Tap-Wave-Rub (TWR), a lightweight malwaredetection mechanism for smartphones based on intuitivehuman gesture recognition, using sensors already availableon current smartphones with little or no additional userinvolvement.

The proposed gesture-based detection mechanism servesas an extension to the currently adopted permission modelused by major smartphone OSs. That is, whenever a sen-sitive service is requested, a particular gesture needs to bedetected (to make sure it is a human generated activity)before the request can be granted. Otherwise, the activityis very likely generated by malware. As gesture detectionis enforced every time a sensitive request is received, theproposed mechanism provides continuous monitoring ofsensitive resources and services from unauthorized accessattempts by malware. We note, the latest Android JellyBean 4.2 has an added security feature, Premium SMS Con-firmation, that includes a giant list of premium shortcodesfor each country and alerts a user anytime an app tries tosend a message to a shortcode [3]. Our TWR permissionmodel follows a similar approach but the security decisionis based on presence or absence of gestures.

1.2 Our Contributions

The main contributions of this paper are summarized asfollows.

1) We propose TWR, a novel approach for malwareprevention with an exclusive focus on the smartphoneplatform based on intuitive gesture recognition. Aspart of this system, we propose two novel light-weight gesture recognition schemes that can be usedin different contexts with little to no additional userinvolvement. The first mechanism, phone tappingdetection based on accelerometer data, is geared forNFC applications, which usually require the user totap her phone with another device. The second mech-anism, hand waving and finger tapping / rubbing de-tection based on proximity sensor data, involves verysimple gestures from the user, i.e., tapping or rubbinga finger near the top of phone’s screen or waving ahand close to the phone, and broadly appeal to manyapplications (e.g., SMS). These gestures are as easyfor users to perform as many commonly deployedgestures, such as the iPhone’s finger swiping gesture.

CRYPTOLOGY EPRINT ARCHIVE 3

2) We outline how Tap-Wave-Rub can reside within thekernel-level middle layer between sensitive servicesand applications trying to access the services, andbe integrated specifically with the existing Androidpermission model. This TWR-enhanced permissionmodel provides continuous enforcement of accesscontrol to sensitive resources and services even afteran application is installed on the platform.

3) We report on the implementation of our prototypesfor the TWR gesture recognition schemes on theAndroid platform.

4) To evaluate our approach, we conduct experimentsto simulate the behavior of malware and normal userusage activity. Our experiment results show that theproposed mechanism can successfully detect mali-cious attempts to access sensitive services with highdetection rates, while imposing minimal usabilityburden.

The remainder of this paper is organized as follows.Section 2 overviews related research. Section 3 presents thethreat model and design goals, and overviews our system.Section 4 introduces the TWR enhanced permission model.Section 5 presents two light-weight human gesture recogni-tion schemes. Section 6 shows the promising feasibility ofour work through novel experiments. Section 7 discussesvarious features of our system and Section 8 concludes thepaper.

2 RELATED WORK2.1 Malware Detection and PreventionMalware has threatened PCs for many years and a largenumber of malware defenses has accumulated in the lit-erature. Generally speaking, the two most commonly usedapproaches for malware detection and analysis are staticanalysis [14], [39], [37] and dynamic analysis [18], [40],[7], [41]. Static analysis, also known as signature-baseddetection, is based on source or binary code inspection tofind suspicious patterns (malware) inside the code. Thisapproach has been used by many antivirus companies.However, it can be evaded by malware authors throughsimple obfuscation, polymorphism and packing techniques.Also it cannot detect zero day attacks. Dynamic analysis,also known as behavior-based detection, monitors andcompares the running behavior of an application (e.g.,system calls, file accesses, API calls) against maliciousand/or normal behavior profiles through the use of machinelearning techniques. It is more resilient to polymorphicworms and code obfuscation and has the potential to defeatzero-day worms.

Whether static or dynamic, current malware detectiontechniques for desktop computers are still considered tootime consuming for resource-constrained mobile devicesoperated on battery. Most existing research focus on op-timizing desktop solutions to fit on mobile devices. Thework of [39] tries to speed-up the signature lookup processin static analysis by using hashes. Several collaborativeanalysis techniques have been proposed to distribute thework of analysis by a network of devices [36], [38]. Remote

server assisted analysis techniques have also been proposedto reduce the overhead of computation on individual devices[13], [8].

Nevertheless, implementation of malware detection tech-niques on mobile phones is still an emerging area ofresearch and a challenging job [8]. In practice, to protectmobile phones from malware attacks, major mobile phonecompanies, such as Google, Apple, and Nokia, use applica-tion review and permission model to prevent malware frombeing installed at the first place.

In the permission model, a user is asked to checkpermission requests of an application before installing it.The user either needs to grant all the permissions or choosenot to proceed with the installation. So, it is the user’ssole responsibility to decide whether the set of permissionsgranted to the application is potentially harmful or not. Itrequires users be well-educated to identify suspicious per-mission requests. This practice of granting permissions all-at-once and only at installation time provides only coarse-grained access control to sensitive resources / services assubsequent permission check by the OS is transparent tothe user. This leaves doors to malware writers as they canalways expect users who are simply ignorant. Indeed, thewidespread presence of mobile phone malware suggeststhat users actually do not care about giving permission tosoftware they download. So the user is not aware how andwhen the permissions are exercised after the installation.Also permission model can be circumvented through per-mission escalation attacks [17], [23]. Moreover, applicationdevelopers do not always follow the least privilege rule withtheir permission requests. In a recent study, the authorspointed out one-third of application investigated are overprivileged [23]. So far, Android has the most extensivepermission system and it is considered more vulnerable topotential malware attacks.

The application review model adapts another approach.Instead of relying on user permission, smartphone compa-nies review applications themselves before releasing themto people for download. There are two types of applicationreviews: manual and automatic. Apple reviews all applica-tions in the App Store while Symbian automatically testsSymbian signed applications. The manual process can becumbersome and prone to human error while the automaticreview is not considered to be an effective deterrent [8].

The most closely related work to ours is the one proposedin [12]. It shares similar philosophy as ours. It utilizeswhether there are hardware interruptions to differentiatepure software initiated action and human initiated action[12]. It aims at detecting malware specifically targetingSMS and audio services. These services usually start withuser’s pressing or touching the keypad or touchscreen whichgenerate hardware interruptions for each key/screen pressevent. A purely software generated activity (or malwaregenerated activity), on the other hand, will not explicitlygenerate a hardware interrupt. Although this approach isbelieved to be effective for malware detection, it cannothelp detect a more sophisticated malware such as the pick-pocket malware. This is because the pickpocket malware

CRYPTOLOGY EPRINT ARCHIVE 4

gets activated by user’s playing the tic-tac-toe game, whichalready involves touch screen activity that can generatehardware interrupts. The difference between [12] and ourwork can be summarized as follows. [12] attempts to checkwhether there is (any) user activity whereas our goal is tocheck whether there is a special user-aware activity. Soour approach provides more fine-grained access control tosensitive services and thus can detect even sophisticatedmalware.

Another work that parallels to ours was recently pre-sented in [34]. It proposes an approach of user-driven accesscontrol by granting permission to the application whenuser’s permission granting intent is captured. It introducesaccess control gadgets (ACGs) which are UI elements ex-posed by each user-owned resource for applications to em-bed. The user’s authentic UI interaction with correspondingACG grant the permission to an application to access thecorresponding resource. A fundamental difference between[34] and our work is that the design proposed by [34] grantsthe permission to an application when user’s authentic UIinteraction with corresponding ACG is captured whereasour design grants permission to an application when a spe-cific user gesture (tapping/waving) is captured. The designproposed by [34] not only requires kernel level changesbut also necessitates application level modifications. Italso requires Resource Monitor (RM) to be incorporatedfor each resource such as the device drivers. Moreover,it requires additional composition ACG (C-ACG) alongwith composition RM if an application requires differentresources to be accessed/used. Our work, in contrast to [34],has an advantage in that it neither requires application levelchanges nor requires resource monitor to be added for eachresource. Note that if there are many resources that can beused by an application, then the number of C-ACG andC-RM will become extremely large. Another advantage ofour work over [34] is that our design supports “services”(such as NFC) which do not have any specific UI elementsor ACGs associated with them. For the approach of [34]to work with services like NFC, additional ACG for UIinteraction will need to be added, which will significantlyhamper the usability of such services. In contrast, in ourcase, implicit permission granting intent is acquired bycapturing the tapping gesture.

2.2 Human Activity Detection and Security

Gesture recognition has been extensively studied to supportspontaneous interactions with consumer electronics andmobile devices in the context of pervasive computing [6],[11], [29]. Due to the uniqueness of gestures to differentusers, personalized gestures have been used for varioussecurity purposes.

Gesture recognition has been used for user authenticationto address the problem of illegal use of stolen devices[24], [15]. In [24], a mobile device gets unlocked to usewhen it detects the gait (walking pattern) of the legitimateowner. In [15], a smartphone gets locked when it doesnot detect the “picking-up phone” gesture which the owner

naturally performs to answer a phone. Both works providetransparent user authentication and do not require explicituser involvement. [30] reports a series of user studies thatevaluate the feasibility and usability of light-weight userauthentication based on gesture recognition using a singletri-axis accelerometer.

Gesture recognition is also used to defend against unau-thorized reading and Ghost-and-Leech relay attacks inRFID systems [16], [26]. The secret handshake schemeproposed in [16] allows an RFID tag to respond to readerquery selectively when the tag owner moves the tag in acertain pattern (i.e., secret handshake). The work of [26]uses posture as a valid context to unlock a tag in someRFID applications without changing the underlying userusage model.

The use of unique key press gestures or secure attentionsequences (SASs), such as CTRL-ALT-DEL, may alsoserve as a means to defend against malware. However,we are not aware of SASs being currently used on mobilephones. SASs need to be unique and usually require mul-tiple key presses simultaneously (e.g., CTRL-ALT-DEL).Such sequences will be very hard for the user to performon phones. Some of the gestures proposed in our paper (i.e.,hand waving, finger tapping or rubbing) can be viewed asa form of novel and user-friendly SAS suitable for phones.To distinguish a malware activy from a human user activity,[25] rely upon detecting keyboard or mouse activity and aTrusted Platform Module (TPM). However, this approachis limited to devices equipped with a TPM, and thus isnot suitable for most (if not all) current mobile phones andsmartphones.

3 BACKGROUND AND MODELS

3.1 Threat ModelIn our model, we assume that the mobile phone is alreadyinfected with malware. As in the pickpocketing attack of[5], the malware could reside within a benign looking appli-cation (e.g., a game) which the user may have downloadedfrom an untrusted source. Our model covers a broad rangeof malware and does not impose any restriction on malwarebehavior except that an action from the malware is nothuman-triggered. For example, the malware may want toaccess a service or resource (such as NFC, SMS or GPS)available on the phone itself, or to communicate with anexternal entity, such as an attacker-controlled remote server(botmaster).

We assume that the OS kernel is healthy and immune tomalware infection. In particular, the malware is not able tomaliciously alter the kernel control flow. Also, the phonehardware is assumed to be malware-free. Specifically, weassume that the malware can not manipulate the input to,and output from, the phone’s on-board sensors.

We do not impose any restriction as to how frequentlythe malware attempts to access a given service. However,in order to remain stealthy, constantly attempting accesswould not be feasible for the malware, and rather randomor periodic sampling is expected.

CRYPTOLOGY EPRINT ARCHIVE 5

In addition to the user space level control of the phone,the malware may collude and synchronize with an entityin close physical proximity of the phone (and its user).This external entity may attempt to manipulate the physicalenvironment in which the phone is present or interfere withthe user per se. We do not, however, allow this attacker tohave physical access to the phone. That is, if the attackerhas physical access to the phone, then he can lock/unlocka resource just like the phone’s user. In other words, ourmechanisms are not meant for user authentication and donot provide protection in the face of loss or theft of phone.

3.2 Design Goals

For our malware prevention approach to be useful inpractice, it must satisfy the following properties:

• Lightweight-ness: The approach should be lightweightin terms of the various required resources available onthe phone, such as memory, computation and batterypower.

• Efficiency: The approach should incur little delay.Otherwise, it can affect the overall usability of thesystem. We believe that no more than a few secondsshould be spent executing the approach.

• Robustness: The approach should be tolerant to er-rors. Both the False Negative Rate (FNR) and FalsePositive Rate (FPR) should be quite low. A low FNRmeans that a user would, with high probability, beable to execute an application (which accesses somesensitive services) without being rejected. Low FNRalso implies a better usability. On the other hand, LowFPR means that there should be little probability togrant access to a sensitive service when a user doesnot intend to do so. Low FPR clearly implies a littlechance for malware to evade detection.

• Usage Model Consistency: The solution should re-quire little, or no change, to the usage model ofexisting smartphone applications. Ideally, if the use ofa particular phone service can be commonly associatedwith a particular (unique) gesture (e.g., phone tappingfor NFC), this gesture may be used to specially protectthe said service. In this case, no changes to the adoptedusage model will be necessary. It is also possiblethat there is no unique gesture pattern that can befound to use a certain service (e.g., Bluetooth). Insuch a situation, an intuitive gesture template can beassociated with that service and a user will be requiredto explicitly perform the hand movements defined bythat gesture. In this case, only minor changes to theadopted usage model will be imposed.

3.3 Why Tap-Wave-Rub?

At a higher level, to distinguish between malware andhuman-initiated activity, we envision two different cat-egories of methods: implicit and explicit. The implicitapproaches automatically infer whether or not a humanactivity is taking place based on contextual information.The explicit approach, on the other hand, determines the

presence of human user by means of explicit human in-volvement.

A number of context-aware malware detection mecha-nisms are possible. The implicit category of mechanismscould be applied in scenarios whereby the access to aresource is pre-empted by a specific human gesture. Itwould certainly be quick and user-friendly since no ad-ditional work is needed by the user. The use of keypad ortouchscreen activity, as previously studied in [12], mightbe a good fit in some scenarios, such as when accessingSMS and audio services. These services usually start withuser’s pressing or touching the keypad or touchscreen whichgenerate hardware interruptions for each key/screen pressevent. A purely software generated activity (or malwaregenerated activity), on the other hand, will not explicitlygenerate a hardware interrupt. Although this approachmight be effective to prevent malwares targeting SMS andaudio services, it can not help prevent the activation of amore sophisticated malware such as the NFC pickpocketmalware [5]. It is because this malware gets activated byuser’s playing the tic-tac-toe game, which already involvestouch screen activity that can generate hardware interrupts.In other words, the use of keypad or touchscreen activity,as a context for malware detection, is expected to result inhigh false unlocking likelihood, and thus poor security.

Fortunately, in the context of NFC, there already exists anatural gesture, “phone tapping” using which NFC servicesare accessed. Tapping involves touching the phone againstan RFID tag, or another NFC device, and is a gesture userscommonly need to perform to use the NFC functional-ity. In this paper, we develop a phone tapping detectionmechanism based on accelerometer data. As we woulddemonstrate in the subsequent sections, this mechanismsatisfies all of our design goals, being lightweight, efficient,robust and user-friendly.

Unlike NFC, most other services/resources on the phonemay not be associated with a unique implicit gesture. Insuch situations, an explicit human involvement would benecessary. At first glance, Biometrics as well as human-interactive proofs, e.g., CAPTCHAs, appear to be naturalcandidates for such an explicit detection. Both will provethat a human is the one initiating an access request.Biometrics, in addition, also prove that this human is theowner of the phone, and thus provide protection in the eventof loss/theft of device (this level of protection is beyond ourthreat model though). A variety of biometrics (such as faceor voice recognition) are already available on commodityphones. Similarly, CAPTCHAs (such as textual/aural/visualones) are ubiquitous on the Internet and can be easilyadopted on the phones. However, both of these mechanismsdo not satisfy many of our design goals. Biometrics are notlightweight, can be time consuming, have been known to beerror prone, and not very user-friendly (and rather invasive).CAPTCHAs, on the other hand, are reasonably lightweight(especially those based on text), but long known to be asource of user frustration on the Internet, perhaps more soon the phones due to their small form factors complicatingboth display of the challenge and entry of the response.

CRYPTOLOGY EPRINT ARCHIVE 6

Since CAPTCHAs and biometrics do not seem to be sat-isfactory means of explicit malware detection, we turn ourattention to lightweight and natural gestures. In particular,we explore hand waving, finger tapping or rubbing, all ofwhich could be identified by proximity sensors availableon most modern phones. A wave involves waving a handin close proximity of the phone. A tap/rub involves tappingor rubbing a finger near the top of phone’s screen, whichis where the proximity sensor is usually located. All thesegestures cause a sudden fluctuation in the readings of theproximity sensor, and can be very easily identified, and arerobust to errors, as we will demonstrate in the followingsections. It is important to note that the proximity sensoris located off of the touch screen or display, and thereforeregular touchscreen inputs are not likely to trigger a tapor rub. Wave is also a unique gesture quite unlikely to beexhibited in daily activities of users.

4 TWR-ENHANCED PERMISSION MODEL

Permission models have become very common on smart-phone operating systems to provide access control to sen-sitive services for installed third party application. TheAndroid platform has the most extensive permission systemand poses to become a market leader. Thus, we base thedesign of our TWR system on the Android platform.

The idea of the TWR system is to add another layer ofpermission check before the original Android permissioncheck. As stated in Section 3.1, we assume the adversaryis not able to maliciously alter the kernel control flow. Sogesture detection forms a trusted path with the OS.

Intercepted permission requests are handled by the fivecomponents in the TWR’s architecture: TWR Permis-sionChecker, TWR GestureManager, TWR GestureExtrac-tor, TWR TemplateCreator, and TWR GestureDatabase.The architecture of TWR is depicted in Figure 1. In thefollowing paragraphs, we present the role of each TWRcomponent by describing the possible interaction betweenthem and the outside world.

The TWR PermissionChecker stands in front of theoriginal Android Permission check. When an applicationinitiates a request to access a sensitive service, the requestis intercepted by TWR PermissionChecker. This componentinteracts with TWR GestureManager to check whether therequested service is protected by a certain gesture. If not,the request is forwarded to the Android Permission Checkas usual. Otherwise, TWR GestureManager interacts withthe TWR GestureExtractor to begin collecting gesture data(tapping, rubbing, or waving in this paper). The captureddata is then sent to the TWR GestureManager for furtherprocess.

Here we distinguish between two types of gesturerecognition: user-dependent and user-independent. As theirnames suggest, a gesture is user-dependent if there issignificant variation among gesture data from multipleparticipants for the same predefined gesture; while a gestureis user-independent if either there is no apparent differenceor the recognition process does not differentiate among

TWR PermissionChecker

Android Permission Check

Application

TWR GestureManager

TWRGestureExtractor

TWRTemplateCreator

TWRGestureDatabase

User

TWR System

Android Middleware

Kernel

Applications

Fig. 1. The TWR Architecture

user data from multiple participants. Our phone tap gesturerecognition is user-dependent as it captures the user’s ownfeatures of the tapping movement. Given that users canhold a phone in different ways and use different forcesto tap, the phone tap gesture is thus user-dependent. Ourhand wave or finger tap/rubbing recognition scheme is user-independent as it infers user activity by checking whethera special location (in our context, the place where theproximity sensor is located) is touched or not (instead ofthe potentially biometric feature of human movement).

For user-dependent gesture recognition, we usually needto create a gesture template which is used as a reference inthe actual recognition stage. The user can interact with theTWR TemplateCreator to register a new gesture template, toupdate and delete exiting gesture templates. TWR Template-Creator is an Android application which allows interactionbetween TWR and the user. When the user creates, deletes,or modifies the gesture information, it needs to retrieveand store the information to TWR GestureDatabase viaTWR GestureManager. TWR GestureManager is the onlycomponent that has access to TWR GestureDatabase.

So depending on the type of gesture recognition scheme(user-dependent or user-independent), the TWR Gesture-Manager processes the gesture data from TWR Gesture-Extractor differently. If the gesture is user-dependent, itcompares the similarity between the newly captured datawith the corresponding gesture template stored in theTWR GestureDatabase. If the gesture is user-independent,the TWR GestureManager determines directly whether agesture is performed or not by utilizing information in thecaptured data without the help of a template. In either case,if a required gesture is detected, the request is forwardedto Android Permission Check for further check. Otherwise,the request for service access is rejected.

CRYPTOLOGY EPRINT ARCHIVE 7

5 TAP-WAVE-RUB GESTURE DETECTION

5.1 Phone Tap Detection

As we mentioned in Section 2, the use of hardware interrup-tion to differentiate between pure software initiated activityand human initiated activity is not effective to preventmalware hidden under the cover of a victim app, sinceactivating the victim app already involves keypad click orscreen touch which can generate hardware interrupts. Thismotivates us to use app-specific user events to distinguishbetween hidden malware and an app initiated by a humanbeing. That is, instead of simply using general key/screenpress events to infer human activity, we try to recognizewhether it involves the right activity a user needs to do toaccess a sensitive service, such as the access to NFC.

A smartphone is a personal hand-held device installedwith a lot of apps. Most of the time, these apps are activatedby specific phone/hand movements. For example, when auser wants to place a call, she needs to unlock the screen,activates the phone app, inputs the number (or clicks ona name in the contact list), and then puts the phone nearthe ear to start the call. Also, to use the NFC to scan asmart poster, a user needs to unlock the screen, activatesthe NFC reader app, and taps the phone on the smart posterto read information. Since “tapping” (touching the phoneagainst an RFID tag, or another NFC device) is a gesturewhich users commonly need to perform to use the NFCfunctionality, as an illustrative example, we can use tappingto determine whether an NFC access is human-initiatedor not. Intuitively, tapping on a smart-poster should bedifferent from other user phone activities (such as keyboardclick or screen touch) and user physical activities (such aswalking or running).

One advantage of this tapping approach is that it does notrequire any additional user activity besides what is beingused commonly, and thus transparently recognizes user ac-tivity when a user taps a smart poster to obtain information.However, it may exhibit false positive rates and not fullyprevent the pickpocket malware activation since normaluser activities (such as playing the game) may generatemotions similar to tapping. To achieve higher preventionrate, we can try other intuitive user-aware gestures similarto tapping, such as “tapping twice” or “tapping thrice” insuccession.

To recognize tapping, we utilize the on-board accelerom-eter data. An accelerometer sensor measures the forcesapplied to the phone (minus the force of gravity) on thethree axes: x, y, and z. Let (ax, ay, az) denote the valuescorresponding to the 3 axes from the accelerometer.

Our detection algorithm consists of two phases: trainingphase and recognition phase. In the training phase, a userperforms the target action (tapping) multiple times, andaccelerometer data of the action is recorded and processedto generate a tapping template. The template serves as areference to be compared later with real-time user move-ment data: a match indicates the recognition of user tappingactivity; otherwise, it is inferred that either there is no useractivity or the activity is not the “valid” user activity to

grant NFC access.After the training phase, the system compares a newly

observed movement with the template. The system recordsthe accelerometer data, from the moment the user activatesthe NFC reader app, until she taps on a smart poster.To recognize tapping, the system computes the cross-correlation C of the acceleration data A against the templateT , both of size n data points as shown in Equation 1.

The cross-correlation C computed from Equation 1 whencomparing two time series is a real value, representinga similarity measure. The higher the value of C, thehigher the similarity between the two series. The maximumvalue is obtained when the two series under comparisonare identical. A movement is considered a valid tappingactivity when the computed cross-correlation C exceeds acertain cross-correlation threshold which is usually obtainedthrough empirical study.

C(A, T ) =

∑ni=1(axi

− ax)(Txi− Tx)√∑n

i=1(axi− ax)2

√∑ni=1(Txi

− Tx)2

=

∑ni=1(ayi − ay)(Tyi − Ty)√∑n

i=1(ayi− ay)2

√∑ni=1(Tyi

− Ty)2

=

∑ni=1(azi − az)(Tzi − Tz)√∑n

i=1(azi − az)2√∑n

i=1(Tzi − Tz)2(1)

where ax denotes the means of time series axifor i ∈ [1, n]

(others follow the same notation).Here we describe one way to determine the cross-

correlation threshold. Suppose we have m traces of tappingmovements T1, ..., Tm. The threshold CT can be estimatedas the minimum cross-correlation between any two seriesTi and Tj (i, j ∈ [1,m] and i 6= j). That is:

CT = minmi,j=1(C(Ti, Tj)) (2)

5.2 Hand Wave, Finger Tap or Rub Detection

The goal of a proximity sensor is to detect the presence of anearby object, not necessarily in physical contact with thesensor. This sensor works by sending an electromagneticsignal and analyzing the change in the electromagneticfield or the returned signal itself. Usually, the range ofa proximity sensor is very small (few cms at most), anddifferent forms of proximity sensors have been invented[1]. Such sensors have seen an ubiquitous deployment oncurrent mobile phones. For mobile phone manufacturers,the primary use of a proximity sensor is to help preservebattery power. This is achieved by turning off the phone’sdisplay especially in cases when the user is on a phonecall where user’s ear is close to the proximity sensor. Thisis the reason the proximity sensor is located near the earpiece of a mobile phone, off of its display (see Figure 2).This is true for most, if not all, smartphones, including theAndroids and iPhones. We note that this is a property thatwe carefully leverage in developing our tapping or rubbinggesture mechanisms for malware defense. Such gestures

CRYPTOLOGY EPRINT ARCHIVE 8

do not interfere with the gestures made by the user whileinteracting with the phones’ (touch screen) display, andsignificantly reduce the False Positive Rate (FPR).

In order to utilize the proximity sensor for our purpose,we needed a human gesture that can “trigger” the sensor insome way and not likely to be exhibited in daily activities.To this end, we resort to gestures that can fluctuate thereadings of the sensor quickly for a short duration of time.This could be easily achieved by the user bringing in,and moving out, her hand/finger in close proximity of thesensor. This inspired us to develop three gestures: handwaving (waving one’s hand close to the phone), fingertapping (tapping a finger near the sensor, i.e., close to thetop of the screen) and rubbing (rubbing a finger near thesensor).

The algorithm to detect quick fluctuations in the readingof the proximity sensor is very simple and straightforward(which makes it lightweight, satisfying one of our designgoals). Basically, when there is a proximity change, thetime is recorded and the difference in time reading withan n-th previous reading (we use n = 6) is compared. Ifthe difference is less than a pre-specified time limit (theupper bound of the duration of the gesture; in our case1.5 seconds), the device will be unlocked. Otherwise, itwill remain locked. A detailed pseudocode for this simpleprocedure is outlined in Algorithm 1.

Algorithm 1 Tap-Wave-Rub Detection using ProximitySensor Data

1: Set proxIndex = 0, WIND SZ = 6,WAV E TIME LIMIT = 1500 ms, andUNLOCK TIME FRAME = 1000 ms

2: Record the time whenever proximity sensor detects a change1) An array ProxChangeT ime with size equal to

WIND SZ was used to record the time of proximitysensor change in cyclic order.

3: Calculate the time difference between the current time withthe time recorded in previous six sensor change value.

1) T imeDiff = ProxChangeT ime[proxIndex] −ProxChangeT ime[(proxIndex+1)%WIND SZ]

4: If TimeDiff is less than WAV E TIME LIMIT , Unlockfor UNLOCK TIME FRAME

5: Increase ProxIndex by 1, i.e., proxIndex =(proxIndex+ 1)%WIND SZ

6: Repeat Step 2.

6 IMPLEMENTATION AND EVALUATION6.1 Prototypes and Test DevicesTo evaluate the feasibility of the TWR gesture detectionmechanisms, we first developed prototype applications onthe Android platform. The phone tapping detection schemewas implemented and installed on a Google Nexus S An-droid phone, while the hand waving, finger tapping/rubbingdetection scheme was implemented and installed on botha Droid X2 phone and a Samsung Galaxy Nexus (I9250)phone. The two Nexus phones come equipped with NFCchips, and are therefore a good target devices for aneventual deployment of our approach.

6.2 Phone Tap Experiment

In this section, we report on evaluation of tapping baseduser activity recognition scheme outlined in Section 5.1.This scheme is specially designed to protect against mal-ware targeting NFC reading services, since tapping is anatural hand movement which a user needs to perform touse the reader function of NFC.

Since “tapping” (touching the phone against an RFID tag,or another NFC device) is a very simple hand movement,we hypothesize it might be confused with other user move-ments such as those users perform when they play games,and thus have higher false positive rate, FPR (or lowerprevention rate). In a hope to achieve higher preventionrate, we also experiment with two other intuitive user-aware gestures similar to tapping: tapping twice and tappingthrice in succession. We call these three tapping gesturesas “tapping once”, “tapping twice”, and “tapping thrice”,respectively.

First, to determine the cross-correlation detection thresh-old, we collected 30 traces of accelerometer data for eachtapping gesture. Each of our trace contains 100 data pointsand is recorded over a 2-second time period (we wantedour schemes to be efficient). Each trace is then used as atemplate, which is compared with all the other 29 tracesto calculate a serial of C values. The smallest C valueis chosen as the threshold value. This threshold value isthen stored with the corresponding tapping template and amatched posture needs to yield a C value larger than thisthreshold. These traces were collected by the experimenterwhile performing NFC tapping gesture 30 times. Such datacollection and testing methodology is in-line with relatedprior security work, e.g., Secret Handshakes [16]. Ourmethodology captures a realistic usage scenario wherebyeach user can be trained “once by their phone” and cancreate their template, e.g., when they purchase the phone.

We first test the performance of the three tapping gesturesto identify which one can have higher recognition rate (thuslower false negative rate, FNR). To do this, we collecta total of 150 traces for each tapping gesture, 30 tracesevery day for 5 days. We then use the template and thethreshold calculated above to determine the recognitionrate. The successful recognition rate is listed, in the formof a confusion matrix, in Table 1. It shows that “tappingonce” achieves high recognition rate 94.67% (or a low FNR5.33%) compared to the other two tapping gestures.

We next test the performance of the three tapping ges-tures to identify which one is the least to be confused withother user or phone movements and thus has low FPR. Itis important to evaluate the FPR. If a tapping gesture canbe very similar to a certain other movement (accidental ormanipulated by an attacker), the malware may circumventthe gesture detection process.

To determine the FPR, we compare tapping postures withmany phone/user movements. These movements might bejust normal user activities, or activities coerced by a nearbyattacker. They include user movements such as: walking,walking stairs, screen-touch activities (text messaging and

CRYPTOLOGY EPRINT ARCHIVE 9

Fig. 2. The locations of proximity sensor on a sample of smartphones (Galaxy Nexus, Droid X2, Galaxy S II)

TABLE 1Phone Tapping Detection Results (rates at which a gesture shown on each row matches with gesture/activity

shown on each column

Tapping Tapping Tapping Walking Walking Still Screen-touch PhoneOnce Twice Thrice Stairs Activities Movement

Tapping Once 94.67% NA NA 0% 0% 0% 0% 2%Tapping Twice NA 92.67% NA 0% 1.33% 0% 0% 0%Tapping Thrice NA NA 96.67% 0% 5.33% 0% 0% 0%

surfing Internet), phone-moving activity (motion gamingand picking up calls), as well as, the scenarios where phoneis left still. For each movement, we also collect a total of150 traces, 30 traces every day for 5 days. The error ratesare all listed in Table 1.

Our experiment result shows “tapping once” is very un-likely to be confused with walking, walking stairs, still, andscreen-touch activities such as text messaging or Internetbrowsing. However, it might occasionally be confused withphone motion caused when the user plays game or picksup a phone call with a false positive rate of 2%. “tappingtwice” and “tapping thrice”, on the other hand, are veryresilient to phone motions but they resemble motions whena user walks on stairs. Nevertheless, all achieve satisfyinglow false positive rate.

One potential reason why the false positive rate is lowmight be that tapping is a type of user-aware movement.When performing such a gesture, the user is believed to beaware of her hand movement. So gestures are performed ina more-or-less controlled way, e.g., the phone is always heldin the similar way when a user performs tapping. In non-user-aware movements, on the other hand, the phone canbe tilted in any position. The reference template is usuallycollected in a reference coordinate system. However, oncethe phone is tilted, movement data collected from the deviceis no longer in the reference coordinate system and thecorresponding movement will not be detected correctly.In this way, user-aware gesture is very unlikely to besimilar with user-unaware movements, and thus has lowfalse positive rate. Previous studies on gesture recognitionalso suggest certain gestures can be quite unique anddifferent from other gestures [16], [29]. So tapping canbe distinguished from other user-aware movements suchas “picking up the call”.

Our experiment result, contrary to our hypothesis that“tapping once” may have high false positive rate, showsthat “tapping once” actually achieves both high recognition

rate and low false positive rate, and has a performance com-parable to “tapping twice” and “tapping thrice”. However,“tapping once” outperforms the other two tapping gesturesin term of efficiency, and has superior usability since it doesnot require any change in the user usage model of NFC.

6.3 Hand Wave, Finger Tap and Rub ExperimentWe conducted several experiments to evaluate our prototypeimplementing the wave-tap-rub detection mechanism formalware detection. As in the case of the phone tappingexperiments, the goal of our tests was to primarily estimatethe error rates, i.e., FNR and FPR.

In order to determine FNR, one of the authors of thissubmission attempted to unlock the phone using the handwaving as well as rubbing gesture. 50 trials using eachgesture were performed spanning over a period of 2 days.Out of these, only on 1 occasion, this user failed tounlock the phone, resulting in an average recognition rateof 98% (or FNR of 2%). Since multiple trials may havetrained this user significantly likely leading to a bias, wefurther conducted our tests with multiple other users. Thesevolunteers were drawn from our Department (CS) and weremostly students at undergraduate and graduate levels. Theusers were first explained the purpose of the study and thendemonstrated the gestures using which they were to unlockthe phone. The users were specified the location of theproximity sensor on the phone. Although in real life, usersmay not be aware of the location of the proximity sensor,they can be easily provided with this information using asimple interface. For example, an arrow pointer could beprovided on the screen which points to the proximity sensor,and user could be asked to wave/tap/rub accordingly.

A total of 16 volunteers participated in our study. 8 ofthe participants were requested to perform the waving basedunlocking 10 times and the results were recorded automat-ically by our program. The resulting average recognitionrate observed was 93.75% (or FNR of 6.25% [5/80]).

CRYPTOLOGY EPRINT ARCHIVE 10

TABLE 2Hand Waving, and Finger Tapping / Rubbing Detection Results (rates at which a gesture shown on each row

matches with gesture/activity shown on each column)

Hand Tapping / Walking Phone Drop/ Daily Screen-touch Game Game BumpingWaving Rubbing Fall Activity Activities Play (O1) Play (O2)

Hand Waving 93.75% NA 0% 0% 0% 0% 0% 0% 0%Tapping / Rubbing NA 96.25% 0% 0% 0% 0% 0% 6.5% 0%

The other 8 users were asked to evaluate the rub gestureor finger tapping based unlocking. The users were notrestricted to using one or the other gesture, and it was upto their discretion as to which one they wanted to use. Theobserved average recognition rate in this case was 96.25%(or FNR of 3.75% [3/80]).

These error rates can be deemed to be fairly low, and weexpect them to further reduce significantly as users becomemore and more familiar with such gestures.

Next, we set out to evaluate the likelihood of falseunlocking under different activities. These activities mightbe just daily user activities, or activities coerced by anearby attacker. To this end, the experimenter conductedseveral tests emulating different user activities that have thepotential of triggering the proximity sensor fluctuations. Ineach test, the phone was set to record a trigger event andits frequency. Each of the test was performed with each ofour test devices (Droid and Nexus).

First, to simulate a walking activity, the mobile phonewas stowed in a backpack and the experimenter, carryingthe backpack on shoulders, walked around for 10 minutes.No trigger events occurred in this case. The experimentwas repeated at a later point of time for a duration of15 minutes. Still, the phone did not get unlock at all. Wefurther continued this experiment for a duration of 2 hoursduring which the experimenter was walking and driving acar. No unlocking events occurred even in this case also.

To trigger sudden fluctuations to the proximity sensorreadings, we next conducted a “drop and fall” test. Thismimicked a situation where the phone accidentally dropson, or is thrown at, a surface. Clearly, we could not justdrop or throw our test device on the floor to avoid damagingit. To do this meaningfully, therefore, we first threw our testdevice on a bed, with the screen (and thus the proximitysensor) facing the bed, from a height of 50cm to 70cm.10 trials of this test were performed. Then, we similarlydropped the device on a couch 15 times and a bed 15 times.No triggering events were recorded over this set of tests.

Our next test involved treating the test device as theexperimenter’s own device. During this test, which lastfor about 1 day, the experimenter made and receivedphone calls, typed SMS messages, browsed the Internet,walked around while the phone was kept in pocket, wentup/downstairs, drove, and left the phone on a table near akeyboard while typing on the keyboard, among other choreactivities. No unlocking was registered.

In the above test, we had evaluated the effect of touch-screen typing on the phone while writing SMSs and brows-

ing the Internet. Since the proximity sensor is located ontop of the screen, such touchscreen activities were naturallynot supposed to be affecting the sensor’s output. However,more involved touchscreen activities, such as game play,may have an impact and may mimic the rub gesture.

To test this hypothesis, the experimenter played a numberof games in the “landscape mode”, such as FIFA or ModernCombat. This test was performed while the phone was heldin two orientations:O1: thumb blocking the proximity sensor (Figure 3)O2: thumb not blocking the proximity sensor (Figure 4)

The games were played for a duration of about 15minutes on each of our test devices. Interestingly, thephone did register a few unlocking events in the secondorientation. A total of 4 such events occurred out of a totalof 61 changes in sensor readings, resulting in an FPR of6.5%. However, no unlockings were recorded in the firstorientation. This could be attributed to the fact that whenthe thumb was blocking the sensor, it did not induce suddenfluctuations to the reading. On the other hand, when thethumb was below the sensor, while playing a game, thegestures, such as firing a gun or moving an object, wouldhave triggered the sensor.

We also conducted tests to simulate a nearby adversarywho may (deliberately) bumps into a user, while the phoneis in the pocket, in order to trigger a false unlocking. Wedid 10 trials of this test, and no unlocking was recorded,leading to a FPR of 0%.

The results of all of our tests are summarized as a confu-sion matrix depicted in Table 2. These indicate that the tap-wave-rub gesture based on proximity sensing is quite robustto accidental unlocking in practice. The only scenario wherethe phone could accidentally get unlocked was during atouchscreen game play activity when the phone was heldin a particular orientation. However, even this scenarioprovides an important insight. Namely, a rubbing gesture islikely to be exhibited while playing games in the landscapemode under one of the orientations only, the one wherethe thumb does not block the sensor. This implies that,depending upon the location of the proximity sensor ona phone, one of the orientations could be disabled by thephone’s OS to prevent this accidental unlocking from takingplace.

7 DISCUSSION

Overall, our experiment results in previous section showthat both phone tapping recognition, and hand waving,finger tapping or rubbing recognition can serve as two

CRYPTOLOGY EPRINT ARCHIVE 11

Fig. 3. Orientation 1 (O1): Thumb Blocking the Prox-imity Sensor

Fig. 4. Orientation 2 (O2): Thumb Not Blocking theProximity Sensor

different means to infer the “right” human activity in orderto unlock the use of sensitive services on smartphones, thuspreventing unauthorized and stealthy access from malware.Both result in quite low error rates, FNR as well as FPR,demonstrating the effectiveness of our approach. We believethat FNR can be further reduced as users become more andmore familiar with the underlying gestures.

In this section, we compare the performance of the twogesture recognition schemes and discuss issues relevant togesture recognition for human activity inference in general.

7.1 Effect of Orientation

For phone tapping recognition, we use a single three-axis accelerometer to infer the force due to tapping bymeasuring accelerations on the three axes. When the phoneis held in the hand with different ways or under differentorientations, the accelerometer can be tilted around thethree axes. This means that the same external force mayproduce different accelerations along the three axes of theaccelerometer. Hence, successful recognition rate (or FNR)can be affected due to different phone orientations. How-ever, as long as the phone is held in the similar way, thus thesame orientation, when a user performs tapping (which isusually the case), the tilting effect can be minimized. Weemphasize that for all three tapping gestures, we achieve

similar high recognition rates (or low FNR) as other handgestures in previous study [16], [29]. On the other hand,orientation or the tilt effect does help us achieve very lowFPR as we explained in Section 6.2.

Our experiment results show that phone’s orientation canaffect the FPR of our proximity sensor based scheme. Thisis because in a certain orientation, it might be possible forhand movements to accidentally trigger the proximity sen-sor. As argued in Section 6.3, we suggest the OS can disablesuch a “dangerous” orientation to prevent this accidentalunlocking from taking place. Luckily, due to the locationof the proximity sensor (near the ear piece), only one ofthe orientations need to be disabled, while the other cansafely be permitted, without lowering the overall usability.Another possibility to avoid such an accidental unlocking inour scheme is to infer the orientation of the device based onaccelerometer data. If a dangerous orientation is detected,the access can be disabled by default.

7.2 Implicit vs. Explicit GesturesAs outlined in Section 3.3, gesture recognition can beimplicit or explicit. In implicit gesture recognition, a user’snatural movement to access a certain service is captured forhuman activity inference. The good thing about implicitgesture recognition is that it does not change the usagemodel of existing smartphone applications and transpar-ently enforces access control to sensitive services. Thephone tap gesture is an example of such an implicit gesturerequired for accessing the NFC service. Not every servicecan be associated with a particular gesture, however.

In explicit gesture recognition, a user is required toperform a certain hand movement to unlock a service. Thehand wave or finger tap/rub gesture is an example of suchan explicit gesture. As long as gesture recognition is robust(with low FPR and FNR) and lightweight, multiple servicescan share the same explicit gesture. This is indeed the casefor our explicit gestures. Given that our phone tap gestureperformed quite well, it can also be used as an explicitgesture to access a service besides NFC. In this case, theuser would simply need to tap her phone with her hand ora nearby object to unlock a service.

7.3 User CharacteristicsThe phone tap gesture is user-dependent as it capturesthe user’s own (perhaps unique) feature of the tappingmovement for recognition. A previous study shows thatthere is significant variation among different participantseven for the same predefined gesture [29]. Indeed, user-dependent gestures can be and has been used for userauthentication [24], [15]. As for phone tap, different usersmay hold the phone in different ways and use differentforces to perform the gesture. This means that this ges-ture may provide user authentication, and thus protectionagainst unauthorized access to services in case the phoneis lost/stolen. Validation of this hypothesis required furtherinvestigation. However, when a phone is shared by multipleusers, say by different family members, each has to register

CRYPTOLOGY EPRINT ARCHIVE 12

his/her own tap template in order to use our approach.This training is only one-time and will not undermine theusability of our approach.

On the other hand, our hand wave, and finger tap/rubbinggestures is quite user-independent as shown in our exper-iment results. That means, the service can be shared bymultiple users without registering his/her own template. Infact, there is no template employed in this scheme. Thisscheme therefore may offer a higher level of convenienceto the users and easily adoptable. Of course, it does notprevent unauthorized use of such service when the phoneis stolen.

7.4 Social Engineering AttacksThe low FPR in our experiments demonstrates the lowlikelihood that an application will gain access to criticalresources without the knowledge of the user. This is basedon an assumption that a malware program can not emulatethe required gestures. However, it is possible for a compet-ing malware writer to fool a user by launching a social-engineering attack. For example, a malware developer candesign a game such that user has to move his phonein certain ways mimicking the gesture of tapping, or agame in which a frequently pressed button is placed veryclose to the proximity sensor, which may trigger waveor rubbing gesture. Also, malicious program can wait forthe legitimate program to ask for the gesture permissionand while user permits the legitimate application, theeavesdropping malware can use the gesture along withthat application to trigger malicious activity. While suchattacks are likely, they still require the malware program toconstantly wait for, and synchronize with, the desired usergestures, which may make these programs easily detectableby the OS. Nevertheless, our approach still significantlyraises the bar against many existing malware attacks, aprominent advancement in start-of-the-art in smartphonemalware prevention.

7.5 Other Explicit GesturesTo further extend our approach, it is natural to considerother explicit gestures, or other ways to infer the gestures.So far, we detected hand waving based on proximity sensordata. The ambient light sensor readings can also be usedin this context, as shown in another recent work [4]. Theadvantage of this sensor is that the hand would not need tobe very close to the phone. This approach also requiresaccelerometer data to reduce false positives, as simplephone movement may also trigger light values.

Accelerometer-based explicit gestures are definitely ap-plicable for our purpose and some of the previously de-veloped gesture recognition schemes [16], [29] might besuitable. However, such gestures may not be as intuitive andlightweight as the explicit gesture proposed in this paperand may require a higher level of user diligence. Besidesaccelerometers, other sensors that come to mind, and maybe useful for our case, are magnetometers, and ambienttemperature sensors. It does not seem possible, however,

to trigger a magnetometer without carrying a specializedequipment or object, such as a magnet. Similarly, fluctuat-ing the environmental temperature via human actions seemrather difficult.

7.6 Power EfficiencyAnother important aspect to cover in our work is the powerconsumed by the sensors while trying to capture the usergesture. Since the battery-life is one of the most importantfactors to be considered for user’s day-to-day activity, ourdesign needs to be battery-friendly.

Our approach is indeed quite power-efficient. The ges-tures proposed by our design are short (last only a fewseconds) and thus we only need to turn on the underlyingsensors momentarily. Only when the user requests accessto a particular resource, the sensor is turned on and the(explicit) gesture detection algorithm is executed.2 Once thekernel captures the required gesture, the permission will begranted to the active application and sensor will be turnedoff. If kernel fails to capture the gesture within a certainduration, the sensor will be turned off and applicationwill be denied to use the resource. In the case of NFCservices or implicit gestures, tapping detection algorithm isexecuted only when the NFC reader detects a nearby tag.Moreover, our gesture detection algorithms themselves arevery lightweight and thus only require negligible amountof power.

8 CONCLUSION AND FUTURE WORKIn this paper, we introduced a lightweight permissionenforcement approach – Tap-Wave-Rub (TWR) – for smart-phone malware detection and prevention. TWR is basedon simple human gestures that are very intuitive but lesslikely to be exhibited in users’ daily activities. Presenceor absence of such gestures, prior to accessing a service,can effectively inform the OS whether the access request isbenign or malicious. Specifically, we presented the designof two mechanisms: (1) phone tapping detection based onaccelerometer data, and (2) finger tapping, rubbing or handwaving detection based on proximity sensor data. The firstmechanism is geared for NFC applications, which usuallyrequire the user to tap her phone with another device. Thesecond mechanism involves very simple gestures from theuser, i.e., tapping or rubbing a finger near the top of phone’sscreen or waving a hand close to the phone, and broadlyappeals to many applications (e.g., SMS). In addition, wepresent the TWR Android permission model, the prototypesimplementing the underlying gesture recognition mecha-nisms, and a variety of novel experiments to evaluate them.

Our results suggest the proposed approach could bevery effective for malware prevention, with quite low falsepositives and false negatives, while imposing little to noadditional burden on the users. The false negatives areexpected to further reduce significantly as users become

2. For the sake of our experiments, we have turned on the sensorsall the time. This was done so that we can determine the FPRs via ourexperiments.

CRYPTOLOGY EPRINT ARCHIVE 13

more familiar with the underlying gestures, especially sincethey are quite intuitive. In addition, the false positives canalso be carefully avoided in most cases, for example, bydetecting the orientation of the device.

Our future effort will be focused on realizing this ap-proach in practice and further evaluate it with a widerange of smartphones and smartphone users. On a broaderperspective, we believe that the gestures introduced in thispaper will also facilitate novel ways in which users caninteract with their devices and open up new opportunities(for security purposes or otherwise).

REFERENCES

[1] Proximity Sensors. A description available at Wikipedia: http://en.wikipedia.org/wiki/Proximity sensor.

[2] Tap, Wave and Rub Magic. A description and video available at:http://www.vinnymarini.com/download/tapwave.html.

[3] R. Amadeo. Exclusive: Android 4.2 alpha teardown, part 2: SELinux,VPN lockdown, and premium SMS confirmation. Available onlineat http://www.androidpolice.com/2012/10/17/exclusive-android-4-2-alpha-teardown-part-2-selinux-vpn-lockdown-and-premium-sms-confirmation/, Oct. 2012.

[4] Anonymized for Submission. Wave-to-access: Protecting sensitivemobile device services via a hand waving gesture. In Submission,2012.

[5] W. Augustinowicz. Trojan horse electronic pickpocket demo byidentity stronghold. Available online at http://www.youtube.com/watch?v=eEcz0XszEic, June 2011.

[6] T. Baudel and B.-L. Michel. Charade: remote control of objectsusing free-hand gestures. Communication of ACM, 36:28–35, 1993.

[7] A. Bose, X. Hu, K. Shin, and T. Park. Behavioral detection ofmalware on mobile handsets. In MobiSys’08, 2008.

[8] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid:Behavior-based malware detection systems for Android. In ACMCCSW Workshop, 2011.

[9] L. Cai and H. Chen. Touchlogger: inferring keystrokes on touchscreen from smartphone motion. In Proc. of USENIX HotSec, 2011.

[10] M. Calamia. Mobile payments to surge to $670 billion by 2015.Available online at http://www.mobiledia.com/news/96900.html, Jul.2011.

[11] X. Cao and R. Balakrishnan. VisionWand: interaction techniques forlarge display using a passive wand tracked in 3D. In ACM UIST’03,2003.

[12] A. Chaugule, Z. Xu, and S. Zhu. A specification based intrusiondetection framework for mobile phones. In ACNS’11, 2011.

[13] J. Cheng, S. Wong, H. Yang, and S. Lu. Smartsiren: virus detectionand alert for smartphones. In 5th International Conference on MobileSystems, Applications and Services (MobiSys’07), 2007.

[14] M. Christodorescu and S. Jha. Static analysis of executables todetect malicious patterns. In 12th Conference on USENIX SecuritySymposium, 2003.

[15] M. Conti, I. Zachia-Zlatea, and B. Crispo. Mind how you answerme!: transparently authenticating the user of a smartphone when an-swering or placing a call. In Proceedings of the 6th ACM Symposiumon Information, Computer and Communications Security, ASIACCS’11, 2011.

[16] A. Czeskis, K. Koscher, J. Smith, and T. Kohno. RFIDs andsecret handshakes: Defending against Ghost-and-Leech attacks andunauthorized reads with context-aware communications. In ACMConference on Computer and Communications Security, 2008.

[17] L. Davi, A. Dmitrienko, A. R. Sadeghi, and M. Winandy. Privilegeescalation attacks on Android. In ISC’10, 2010.

[18] D. R. Ellis, J. G. Aiken, K. S. Attwood, and S. D. Tenaglia. Abehavioral approach to worm detection. In ACM Workshop on Rapidmalcode (WORM), 2004.

[19] F-Secure. Bluetooth-worm:symbos/cabir. Available online athttp://www.f-secure.com/v-descs/cabir.shtml.

[20] F-Secure. Trojan:symbos/viver.a. Available online at http://www.f-secure.com/v-descs/trojan symbos viver a.shtml.

[21] F-Secure. Worm:symbos/commwarrior. Available online athttp://www.f-secure.com/v-descs/commwarrior.shtml.

[22] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner. A surveyof mobile malware in the wild. In ACM CCSW Workshop, 2011.

[23] A. P. Felt, H. J. Wang, A. Moshchuk, S. Hanna, and E. Chin.Permission re-delegation: attacks and defenses. In USENIX SEC’11,2011.

[24] D. Gafurov, K. Helkala, and T. Søndrol. Biometric gait authentica-tion using accelerometer sensor. Journal of Computers, 1(7):51–59,2006.

[25] R. Gummadi, H. Balakrishnan, P. Maniatis, and S. Ratnasamy. Not-a-bot: improving service availability in the face of botnet attacks. InProceedings of the 6th USENIX symposium on Networked systemsdesign and implementation, NSDI’09, pages 307–320, Berkeley, CA,USA, 2009. USENIX Association.

[26] T. Halevi, S. Lin, D. Ma, A. Prasad, N. Saxena, J. Voris, andT. Xiang. Sensing-enabled defenses to rfid unauthorized readingand relay attacks without changing the usage model. In PerCom’12,2012.

[27] J. Han, E. Owusu, T.-L. Nguyen, A. Perrig, and J. Zhang. ACCom-plice: Location Inference using Accelerometers on Smartphones. InProc. of COMSNETS, Jan. 2012.

[28] ISO. Near field communication interface and protocol (nfcip-1)——iso/iec 18092:2004. Available online at http://www.iso.org/iso/catalogue detail.htm?csnumber=38578, 2004.

[29] J. Liu, Z. Wang, L. Zhong, J. Wickramasuriya, and V. Vasudevan.uWave: Accelerometer-based personalized gesture recognition andits applications. Pervasive and Mobile Computing, 5(6):657–575,December 2009.

[30] J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan. Userevaluation of lightweight user authentication with a single tri-axisaccelerometer. In MobileHCI’09, 2009.

[31] P. Marquardt, A. Verma, H. Carter, and P. Traynor. (sp)iPhone:decoding vibrations from nearby keyboards using mobile phoneaccelerometers. In Proc. of ACM CCS, 2011.

[32] C. Mulliner. Vulnerability analysis and attacks on NFC-enabledmobile phones. In 1st International Workshop on Sensor Security(IWSS) at ARES, 2009.

[33] E. Owusu, J. Han, S. Das, A. Perrig, and J. Zhang. ACCessory:Keystroke Inference using Accelerometers on Smartphones. In Proc.of HotMobile), Feb. 2012.

[34] F. Roesner, T. Kohno, A. Moshchuk, B. Parno, H. J. Wang, andC. Cowan. User-driven access control: Rethinking permissiongranting in modern operating systems. In 33rd IEEE Symposiumon Security and Privacy (Oakland 2012), 2012.

[35] R. Schlegel, K. Zhang, X. yong Zhou, M. Intwala, A. Kapadia, andX. Wang. Soundcomber: A stealthy and context-aware sound trojanfor smartphones. In Proc. of NDSS, 2011.

[36] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz,K. Yksel, S. Camtepe, and A. Sahin. Static analysis of executablesfor collaborative malware detection on Android. In ICC 2009Communication and Information Systems Security Symposium, 2009.

[37] A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer. Detectionof malicious code by applying machine learning classifiers on staticfeatures: A state-of-the-art survey. Inf. Secur. Tech., 14:16–29, Feb.2009.

[38] A. S. Shamili, C. Bauckhage, and T. Alpcan. Malware detectionon mobile devices using distributed machine learning. In 20thInternational Conference on Pattern Recognition (ICPR’10), 2010.

[39] D. Venugopal. An efficient signature representation and matchingmethod for mobile devices. In WICON’06, 2006.

[40] D. Venugopal, G. Hu, and N. Roman. Intelligent virus detection onmobile devices. In PST’06, 2006.

[41] X. Liang and X. Zhang and Jean-Pierre Seifert and S. Zhu. pBMDS:A behavior-based malware detection system for cellphone devices.In WiSec’10, 2010.