a methodology for empirical analysis of permission-based...

A Methodology for Empirical Analysis ofPermission-Based Security Models

and its Application to Android

David [email protected]

H. Günes Kayacı[email protected]

P.C. van [email protected]

Anil [email protected]

School of Computer Science, Carleton UniversityOttawa, ON, Canada

ABSTRACTPermission-based security models provide controlled ac-cess to various system resources. The expressiveness ofthe permission set plays an important role in providing theright level of granularity in access control. In this work,we present a methodology for the empirical analysis ofpermission-based security models which makes novel useof the Self-Organizing Map (SOM) algorithm of Kohonen(2001). While the proposed methodology may be appli-cable to a wide range of architectures, we analyze 1,100Android applications as a case study. Our methodologyis of independent interest for visualization of permission-based systems beyond our present Android-specific empiri-cal analysis. We offer some discussion identifying potentialpoints of improvement for the Android permission model,attempting to increase expressiveness where needed with-out increasing the total number of permissions or overallcomplexity.

Categories and Subject DescriptorsI.2.6 [Artificial Intelligence]: Learning—Connectionismand neural nets; D.4.6 [Operating Systems]: Securityand Protection—Access Controls

General TermsSecurity, Experimentation

KeywordsAccess control, self-organizing maps, permission-based se-curity, smartphone operating systems, visualization

This is the authors’ version of the work. It is posted here by permission ofthe ACM for your personal use. Not for redistribution. The definitive ver-sion was published in ACM CCS’10, October 4–8, 2010, Chicago, Illinois,USA.Copyright 2010 ACM 978-1-4503-0244-9/10/10 ...$10.00.

1. INTRODUCTIONAccess control lists (ACLs) and permission-based secu-

rity models allow administrators and operating systemsto restrict actions on specific resources. In practice, de-signing and configuring ACLs (particularly those with alarge number of configuration parameters) is a compli-cated task. More specifically, reaching a balance betweenthe detailed expressiveness of permissions and the usabil-ity of the system is not trivial, especially when a systemwill be used by novices and experts alike.

One of the main problems with ACLs and permissionmodels in general is that they are typically not designed bythe users who will ultimately use the system, but rather bydevelopers or administrators who may not always forseeall possible use cases. While some argue that the prob-lem with these permission-based systems is that they arenot designed with usability in mind [11], we believe thatin addition to the usability concerns, there is not a clearunderstanding of how these systems are used in practice,leading security experts to blindly attempt to make thembetter without knowing where to start.

While there are many widely deployed systems whichuse permissions (some are discussed in Section 2.2), wefocus on the empirical analysis of the permission model in-cluded in Android OS [1]. Android is a newcomer to thesmartphone industry and in just a few years of existencehas managed to obtain significant media attention, marketshare, and developer base. Android uses ACLs extensivelyto mediate inter-process communication (IPC) and to con-trol access to special functionality on the device (e.g., GPSreceiver, text messages, vibrator, etc.). Android develop-ers must request permission to use these special featuresin a standard format which is parsed at install time. TheOS is then responsible for allowing or denying use of spe-cific resources at run time. The permission model usedin Android has many advantages and can be effective inpreventing malware while also informing users what ap-plications are capable of doing once installed.

The main objectives of our empirical analysis are: (1) toinvestigate how the permission-based system in Androidis used in practice (e.g., whether the design expectationsmeet the real-world usage characteristics) and (2) to iden-

tify the strengths and limitations of the current implemen-tation. We believe such analysis can reveal interestingusage patterns, particularly when the permission-basedsystem is being used by a wide spectrum of users withvarying degrees of expertise. As of July 2010, there areover 80,000 applications available for Android [2], manyof which are free and written by both large software de-velopment companies and hobbyists. Also, the AndroidMarket is not controlled as tightly as other mobile appli-cation stores [5]. This implies the applications may exhibitmore variety in terms of requested permissions along withother behavioral characteristics which might not occur ina closed environment.

Contributions. Our main contribution is a novelmethodology for exploring and empirically analyzingpermission-based models. In this paper, we employ ourmethodology for the analysis of 1,100 applications writ-ten for the Android OS. Using the Self-Organizing Map(SOM) algorithm [16], we identify trends in how devel-opers of these applications use the Android permissionsmodel. We find that while Android has a large numberof permissions restricting access to advanced functionalityon devices, only a small number of these permissions areactively used by developers. Our analysis identifies per-missions that are overly broad (i.e., controlling access to alarge set of features). Furthermore we identify applicationclusters based on requested permissions, and extract theprominent permissions within each cluster. Our empiricalobservations provide a basis for possible enhancements tothe Android permission model.

The remainder of this paper is structured as follows.Section 2 presents background on permission-based se-curity architectures, provides examples of some currentlydeployed permission-based systems and discusses relatedwork. Section 3 describes the Android operating systemand its novel permission model. The dataset used in ourcase study is also covered in this section. In Section 4we discuss our methodology based on the Self-OrganizingMap algorithm. Section 5 covers the results of our analysisand discusses the generated visualizations. Section 6 sum-marizes key findings and suggests points for improvementin Android. We conclude in Section 7.

2. BACKGROUNDAccess control systems have existed for a long time [17].

In its basic form, a security system based on access con-trol lists allows a subject to perform an action (e.g., read,write, run) on an object (e.g., a file) only if the subjecthas been assigned the necessary permissions. Permis-sions are usually defined ahead of time by an administra-tor or the object’s owner. Basic file system permissions onPOSIX-compliant systems [12] are the traditional exampleof ACL-based security since objects – in this case, files –can be read, written or executed either by the owner ofthe file, users in the same group as the owner, and/or ev-eryone else. More sophisticated ACL-based systems allowthe specification of a complex policy to control more pa-rameters of how an object can be accessed.

We use the term permission-based security to refer to asubset of ACL-based systems in which the action doesn’t

change (i.e., there is only one possible action to allow ordeny on an object). This would be similar to having multi-ple ACLs per object, where each ACL only restricts accessto one action. We note that reducing the allowable ac-tions to one does not necessarily make the system easierto understand or configure. For example, in the Androidpermission model, developers implement finer level gran-ularity by defining separate permissions for read and writeactions. This implies that, compared to general ACLs, thepermission hierarchy is flat and has limited sense of group-ing.

2.1 Permission-Based Security ExamplesAn example of a permission-based security model is

Google’s Android OS for mobile devices. Android requiresthat developers declare in a manifest a list of permissionswhich the user must accept prior to installing an applica-tion. Android uses this permission model to restrict accessto advanced or dangerous functionality on the device [14].The user decides whether or not to allow an application tobe installed based on the list of permissions included bythe developer. We describe the Android security architec-ture in detail in Section 3.

Similar to Android OS, the Google Chrome web browseruses a permission-based architecture in its extension sys-tem [4]. Extension developers create a manifest wherespecific functionality (e.g., reading bookmarks, openingtabs, contacting specific domains) required by the exten-sion can be requested. The manifest is read at extensioninstall time to better inform the user of what the extensionis capable of doing, and reduce the privileges that exten-sions are given [10]. In contrast, Firefox extensions, whichdo not have this permission architecture, run all extensioncode with the same OS-level privileges as the browser it-self.

A third example of a currently deployed permission-based architecture is the Blackberry platform from Re-search In Motion (RIM). Blackberry applications written inJava must be cryptographically signed in order to gain ac-cess to advanced functionality (known as Blackberry APIswith controlled access) such as reading phone logs, mak-ing phone calls or modifying system settings [3]. TheBlackberry OS enforces through signature validation thatan application has been granted permissions to access thecontrolled APIs.

2.2 Related WorkEnck et al. [13] describe the design and implementation

of a framework to detect potentially malicious applicationsbased on permissions requested by Android applications.The framework reads the declared permissions of an appli-cation at install time and compares it against a set of rulesdeemed to represent dangerous behaviour. For example,an application that requests access to reading phone state,record audio from the microphone, and access to the Inter-net could send recorded phone conversations to a remotelocation. The framework enables applications that don’tdeclare (known) dangerous permission combinations to beinstalled automatically, and defers the authorization to in-stall applications that do to the user.

Ontang et al. [18] present a fine-grained access con-trol policy infrastructure for protecting applications. Theirproposal extends the current Android permission modelby allowing permission statements to express more detail.For example, rather than simply allowing an application tosend IPC messages to another based on permission labels,context can be added to specify requirements for config-urations or software versions. The authors highlight thatthere are real-world use cases for a more complex policylanguage, particularly because untrusted third-party appli-cations frequently interact on Android.

On the topic of analysis of permission-based architec-tures, Barth et al. [10] analyzed 25 browser extensions forFirefox and identified that 78% are given more privilegesthan necessary, increasing the attack surface on thesefeature-enhancing add-ons. The analysis lead the authorsto the design of a permission-based system for browser ex-tensions in Google Chrome. The system controls access tobookmarks, tabs, and domains available to a particular ex-tension. Investigating the usability of permission-based ar-chitectures, Reeder et al. [19] developed a framework fordisplaying and editing file permissions on a Windows oper-ating system. They employed a matrix-based visualizationcalled expandable grid, which provides a conceptual visu-alization of file permissions in a graphical format. Theiruser studies showed this grid visualization allows users tocomplete tasks quickly and more accurately.

Bearing similarities to our work, but not in methodol-ogy or application, Smetters et al. [20] conducted a studyof permission-based architectures, particularly access con-trol lists for document sharing within an organization. Var-ious data mining techniques were utilized to understandhow employees used access control lists. Smetters et al.argued that to find the appropriate balance between con-trol and complexity in a permission-based architecture, itis important to determine what level of control users needby analyzing how users interact with the architecture inpractice. In this paper, we analyze the real world use ofAndroid permissions through an empirical analysis.

3. ANDROID PERMISSION MODELWe review the Android operating system and its

permission-based security model. We also discuss thedataset used for our analysis and highlight some initial ob-servations made prior to applying our data mining method-ology.

3.1 AndroidAndroid is middleware for mobile devices that is built on

top of Linux. Currently mainly deployed on smartphones,the Android platform is quickly gaining market share anddue to its open source nature, has been ported to otherdevices such as laptops, tablets and ebook readers. An-droid has a strong focus on security and attempts to ad-dress some of the shortcomings of other mobile operatingsystems. Android applications are written in Java syntaxand each run in a custom virtual machine known as Dalvik,a light-weight replacement for the standard Java VirtualMachine. The effect of running each application inside avirtual machine is extensive process isolation which in at

least one instance [6] has prevented an exploit in an ap-plication to also impact other parts of the OS. Isolation isfurther enforced by each application being installed as itsown user and group ID [14].

Applications written for Android can be distributed toend users directly through a developer’s web site, orthrough the on-device application store known as the An-droid Market. The Android Market offers a central loca-tion where developers can submit their applications and,with minimal interaction from Google, reach end user de-vices. This differs from the Apple iTunes App Store whereall applications must pass through a vetting process [5](performed by Apple in a closed manner) before reachingconsumers. Android Market applications are not alwaysinspected upon submission, allowing malicious applicationdevelopers to quickly get their applications onto end userdevices. With this security concern in mind, Android hasbeen designed to isolate third party applications from eachother, as well as to protect the operating system and usersfrom malicious applications.

3.2 Android PermissionsAt the core of the Android security security model is

a permission-based system that by default denies accessto features or functionality that could negatively impactthe user experience, the system, or other applications in-stalled on the device. Examples of these features are send-ing messages or making phone calls, which may incur mon-etary cost to the user; keeping the device screen on or ac-cessing the vibrator, which could result in battery drain;and reading the user’s address book which could result inprivacy violations.

To make use of the restricted functionality (which couldpotentially be dangerous if used in combination with otherfeatures, or in a different way than intended by the An-droid OS designers), Android requires application devel-opers to declare which of the restricted features are in-tended to be used by their application. Failure to declarea particular permission will result in the related systemcall or inter-process communication being denied1 by An-droid. There are currently 110 items of functionality whichare identified as requiring an explicit permission in orderfor Android to grant access [8]. These permissions controlaccess to network and GPS functions, personal informa-tion, system hardware and settings, and many other de-vice features. However, Android is designed such that anythird party application can define new functionality (e.g.,through an API) and make that specific functionality avail-able to other applications based on developer-defined per-missions. In the case of developer-defined (also known asself-defined ) permissions, Android enforces that both thecaller and callee applications have matching permissions(i.e., the callee defined the permissions using the permis-sion tag in its manifest, and the caller requested the per-missions using uses-permission) to allow the IPC to takeplace.

Every application (including system applications such asthe phone or calendar applications) written for the Android

1The actual interaction is mediated by IBinder which is aLinux Kernel module.

platform must include an XML-formatted file named An-droidManifest.xml. This essential part of the Android se-curity model contains (along with other metadata such asminimum OS version requirements) the permission dec-larations that the application is requesting access to [7].Permissions are declared in the manifest using the uses-permission tag followed by a common namespace (usuallyandroid.permission.* for Google defined permissions,and expressed in this paper as a.p.*). Self-declared per-missions which other applications can request are labeledwith the permission tag. The Android manifest containsentries which can be automatically generated by the de-veloper environment but some fields, specifically those re-lated to permission declarations, must be manually en-tered. Manual permission declaration can lead to appli-cation developers over-declaring or erroneously declaringpermissions that don’t exist, as explained in Section 3.4

Permissions are enforced by Android at runtime, butmust be accepted by the user at install time. When usersinstall a new application in Android (regardless of how theapplication was obtained), they are prompted to accept ordeny the permissions requested by the application. Per-missions are also described in a more user friendly lan-guage at install time. These descriptions attempt to givea brief, technical explanation of the permission, but do notdisclose what the developer intends to use access to thoseresources for.

3.3 DatasetThe dataset used for our empirical study consists of

1,100 applications, the top2 50 free applications down-loaded in December 2009 from each of the 22 categoriesin the Android Market. In the Market, Google makes a dis-tinction between applications and games. Both of thesemain categories are further subdivided: the games cate-gory into 4 sub-categories (Brain and Puzzle, Arcade andAction, Cards and Casino, and Casual); the applicationsinto 18 sub-categories (Comics, Communication, Enter-tainment, Finance, Health, Lifestyle, Multimedia, Newsand Weather, Productivity, Reference, Shopping, Social,Sports, Themes, Tools, Travel, Demo, and Software li-braries).

Applications in our dataset were obtained in standardZIP or ZIP-compatible Android Packages (APK). For eachapplication, we used the Android Asset Packaging Tool toextract the manifest and read all XML entries of type uses-permission (i.e., permissions that are being requested, notnewly defined). Each application is then represented asa bit vector x = [x1, x2, . . . , xj ]

T ∈ {0, 1}j , in which xj de-notes whether the permission j is requested. Representingapplications this way allows us to employ Self-OrganizingMaps (SOM) for our analysis and enables us to utilize asuitable distance metric to express similarities betweenapplications.

Table 1 lists each of the 22 categories in the AndroidMarket and provides an aggregate count of all the uniquepermission labels requested by each application in a given

2Top applications on the Android Market are believed tobe ranked by Google using a combination of number ofdownloads and aggregate user rating of the application.

Type Category Permissions Avg. Perms.

App.

Comics 9 0.98Communication 62 6.72Demo 16 1.46Entertainment 21 2.86Finance 21 1.84Health 15 1.50Libraries 40 1.36Lifestyle 45 3.42Multimedia 34 3.60News 22 3.62Productivity 52 3.98Reference 21 2.20Shopping 35 4.08Social 37 4.52Sports 17 2.20Themes 1 0.02Tools 49 3.88Travel 40 3.74

Games

Arcade 7 1.74Casino 15 2.30Casual 14 2.00Puzzle 10 1.60

Table 1: Total number of permissions requested foreach of the 22 categories in the Android Market.

category (i.e., if 5 applications in a category requested ac-cess to system configuration, it is only counted once). Thefourth column presents the average number of permissionsrequested by applications in each category. The Communi-cation category is by far the most diverse, with 62 permis-sions requested. This category also had the highest num-ber of permissions per application, with one application(Handcent SMS) requesting 22 different permissions. TheThemes category was the least diverse containing 49 ap-plications which requested no additional permissions andone which requested access to the Internet.

Our dataset includes 119 distinct permission requests,out of which 38 (31.93%) were to self-defined permissions,and the remainder were to Android permissions in theandroid.permission.* namespace. Figure 1 shows thedistribution of the requested permissions, in which the y-axis denotes the number of applications that request thepermission and the x-axis denotes the permission index.Permissions are ordered according to the their requestcounts. The a.p.INTERNET permission (indexed as 1) wasrequested by 686 applications (62.36% of all applications).a.p.RECEIVE_BOOT_COMPLETED (indexed as 10), which en-ables an application to start at system boot, was requestedby 63 applications (5.72% of the total). The distribution ofpermissions in Figure 1 demonstrates that the frequencyof permission requests decays rapidly, and there is a longtail of permissions that are only requested by one or twoapplications in the dataset.

Dataset Limitations. Our dataset contains applica-tions from a large number of developers in a broad range

of categories. While we expect this dataset to be represen-tative and diverse, selecting the top-ranked applicationsmight have the effect of filtering out applications that areeither poorly written and/or malicious. Our dataset con-tains no known malware, and could be biased towards anytrends in developer activity in December 2009. We believe,however, that our dataset reflects real-world usage trendssince previous analysis we carried out on a smaller andearlier dataset gave similar results.

0 20 40 60 80 100 120 1400

100

200

300

400

500

600

700

Num

ber o

f app

licat

ions

requ

estin

g th

e pe

rmis

sion

Permission index

Figure 1: Permission labels exponential decay

3.4 ObservationsDuring our dataset analysis, we identified several cases

where developers requested the same permission twicein their application. This is a developer error, sinceadding the permission more than once does not haveany added benefit. The duplicate permission error wasnot only seen in lesser known applications such as QuickUninstaller (which requested the a.p.READ_PHONE_STATEand a.p.INTERNET permissions twice), but also on pop-ular applications such as Fring (which requested thea.p.INTERNET permission twice). This error is likely due tolimitations of the IDE, and/or the different ways in whichAndroid maps system calls to permissions.

Another observation is that applications request permis-sions that do not exist. For example, the Txeet applicationrequested access to the a.p.ACCESS_COURSE_LOCATIONpermission. The developer was likely attempting to de-clare the a.p.ACCESS_COARSE_LOCATION. If the Txeet ap-plication attempts to make use of coarse location features,Android will throw a security exception and deny accessto location data since the permission is incorrect. Ap-plications also requested a.p.ACCESS_ASSISTED_GPS anda.p.ACCESS_CELL_ID, which have been superseded by thecoarse and fine location permissions since even before An-droid 1.0. These permissions appear to be requested inparallel to their correct equivalent permissions in more re-cent Android releases.

We also found an application that requests a.p.BRICK,which theoretically allows an application to completely dis-able the device. While the a.p.BRICK permission controls

access to calls that can disable the device, third party ap-plications cannot make use of this permission because An-droid restricts use of related function calls to applicationssigned by the same private key as the running Android OS.These permissions are known as signature permissions.

4. METHODOLOGYIn a typical permission-based architecture, numerous

permissions are at the user’s disposal. In our work, we aimto understand how the Android permission model is used inpractice and to determine its shortcomings. While variousmanual analysis and data mining techniques are no doubtapplicable, in our work, we make novel use of the Self-Organizing Map (SOM) algorithm [16], which presents asimplified, relational view of a highly complex dataset bypreserving proximity relationships. The characteristicsthat make SOM suitable for the analysis are that (1) SOMprovides a 2-dimensional visualization of the high dimen-sional data, and (2) the component analysis of SOM canidentify correlation between permissions.

Our methodology allows us to gain insights on how thedevelopers use the given permission model in practice andhighlights the strengths of the permission model as wellas it shortcomings. We note that, although the case studyfocuses on Android, our empirical analysis is suitable forvarious other permission-based architectures, as long asthe applications are represented as a bit string of permis-sions, as discussed in Section 3.3.

4.1 Self-Organizing MapsThe Self-Organizing Map (SOM) is a type of neural

network algorithm, which employs unsupervised learn-ing (i.e., without requiring any labels) to produce a typ-ically 2-dimensional, discretized representation of a highdimensional input space. SOM aims to summarize com-plex datasets while preserving the topological propertiesof the input space. SOM consists of neurons, which havethe same dimensionality as the input space. The neuronsare typically arranged in a rectangular or a hexagonal grid.SOM neurons can be considered as pointers in the inputspace, in which more neurons point to regions with highconcentration of inputs.

The training is competitive. Specifically, when an inputis presented, its Euclidean distance to each SOM neuronis calculated. The neuron with the minimum distance – thebest matching neuron – is identified. The weight valuesof the best matching neuron and its adjacent neurons areadjusted towards the input vector. Updating neurons thisway associates them with groups of patterns in the inputdataset. Training is repeated for each input until the inputdataset is processed several times.

The training algorithm can be summarized in four basicsteps. Step 1 initializes the SOM before training. Step 2determines the best matching neuron, which is the short-est Euclidean distance to the input pattern. Step 3 involvesadjusting the best matching neuron and its neighbors sothat the region surrounding the best matching neuron be-comes closer to the input pattern. This causes SOM tominimize the distance between the updated region and theinput pattern. This training process continues until all in-

put vectors are processed. The convergence criterion uti-lized in SOM is in terms of epochs, which define how manytimes all input vectors should be fed to the SOM for train-ing. Details of the SOM algorithm follow:

Step 1: Initialize neuron weights wi =[wi1, wi2, . . . , wij ]

T ∈ <j . In our work, neuron weights areinitialized with random numbers.

Step 2: Present an input pattern x = [x1, x2, . . . , xj ]T ∈

<j . In this work, each input pattern corresponds to anapplication in which the permissions are expressed in theform of a bit string. For example, an application is rep-resented as the bit string [1, 1, 1, 0, 0]T if it requests per-missions 1, 2, 3 but not 4 and 5. Calculate the distancebetween pattern x, and each neuron weight wi, and there-fore identify the winning neuron or best matching neuronc as follows:

‖x− wc‖ = mini{‖x− wi‖} (1)

In our work, we employed Euclidian distance as the dis-tance metric, normalized to the range [0, 1].

Step 3: Adjust the weights of winning neuron c and allneighbor units

wi(t+ 1) = wi(t) + hci(t)[x(t)− wi(t)] (2)

where i is the index of the neighbor neuron and t is aninteger, the discrete time coordinate. The neighborhoodkernel hci(t) is a function of time and the distance betweenneighbor neuron i and winning neuron c. hci(t) defines theregion of influence that the input pattern has on the SOMand consists of two components [22]: the neighborhoodfunction h(‖ · ‖, t) and the learning rate function α(t), inEquation 3:

hci(t) = h(‖rc − ri‖, t)α(t) (3)

where r is the location of the neuron on two dimensionalmap grid. In our work, we used Gaussian NeighborhoodFunction. The learning rate function α(t) is a decreasingfunction of time. The final form of the neighborhood kernelwith Gaussian function is

hci(t) = exp (−‖rc − ri‖2

2σ2(t))α(t) (4)

where σ(t) defines the width of the kernel.Step 4: Repeat steps 2 - 3 until the convergence crite-

rion is satisfied.The training is conducted in two stages. In rough train-

ing, the learning rate (i.e., α(t)) is set to a higher value,hence has the potential to cause greater changes in SOM.On the other hand, in fine tuning, the learning rate is re-duced to facilitate incremental changes in neurons. Train-ing parameters of the SOM employed in this paper aresummarized in Table 2. In this paper, SOM is used fordata exploration. Thus, following the common practices ofexploratory data analysis, training parameters are empiri-cally tested before producing the final visualizations.

5. RESULTSOne of the advantages of SOM over other unsupervised

learning techniques such as typical clustering algorithms

Parameter RoughTraining

FineTuning

Initial α 0.5 0.05α decay scheme inverse_tEpoch Limit 2,000Map Size 10 x 10Initial Neighborhood Size 3 1Neighborhood Function GaussianNeighborhood Relation Hexagonal

Table 2: SOM training parameters

[15] is its ability to create a 2-dimensional visualization ofhigh dimensional cluster structures. As discussed in Sec-tion 4.1, each SOM neuron can be considered as a ‘pointer’in permission space. Thus, applications can be assignedto the nearest neuron, effectively clustering the applica-tions requesting similar permissions into the same neigh-borhood. To visualize the cluster structure of high dimen-sional weight vectors of SOM neurons, a graphic displaycalled U-matrix [21] is used.

Although the training is unsupervised, after training, la-bels from the training data (i.e., the application categories)are used to automatically assign labels to neurons. To labelSOM neurons, a winner-take-all scheme is employed. Thelabeling scheme makes use of the application categoriesof the 1,100 Android applications. Post-training, the inputdata is fed to SOM to determine the best matching neu-ron for each application and a record of the best matchingneuron for each application is maintained. The most pre-dominant category for each neuron determines the label.In other words, a neuron is labeled as Communication ifthe majority of applications, for which the neuron was thebest matching, was from the Communication category.

Assigning labels to SOM neurons helps us to build a 2-dimensional visualization which highlights the similaritiesbetween applications from different categories. Coupledwith the U-matrix visualization, the labeled map providesa visual representation of applications where similar appli-cations (in terms of requested permissions since they canbe from different categories) are placed in close vicinity.

The U-matrix representation of the SOM for Androidpermissions is shown in Figure 2. This representationemploys a heat colormap to show the distances betweenweight vectors of the neurons. The heat colormap rangesfrom black through shades of red and yellow to white(black to grey to white, when printed in grayscale), wherewhite implies ‘hot’ or high values and black implies ‘cold’or low values. Therefore, if the distance between neighbor-ing neurons is small, the region is drawn with dark colors.Conversely, if the distance between neighboring neuronsis large, a light shade is used. The U-matrix represen-tation employs extra hexagons between neurons to showthe topology of the clusters. For example, in Figure 2, thetop left hexagon represents the neuron associated with theTravel category, in Figure 3. However, the adjacent neu-ron associated with the Lifestyle category (see Figure 3) isthe third hexagon from the top left since the hexagon be-

tween the two neurons is introduced to express distancebetween them.

,!-./)01

$

$

%2%*3*

%2456

%266+

Figure 2: U-matrix representation of the SOM forAndroid permissions, with additional hexagons to ex-press the distance between neurons. The scale showsthe normalized Euclidean Distance [0,1].

Travel

Shopping

Casino

Casino

Multimedia

Lifestyle

Comics

Lifestyle

Demo

Shopping

Travel

Communication

News

Lifestyle

Shopping

News

Entertainment

Communication

Finance

Finance

Communication

Multimedia

Entertainment

Finance

Multimedia

Lifestyle

Finance

Tools

Multimedia

Demo

Productivity

Multimedia

Entertainment

Finance

News

Communication

Entertainment

Multimedia

Reference

Entertainment

Entertainment

News

Communication

Social

Social

Communication

Communication

Productivity

Health

Multimedia

Entertainment

Communication

Sports

Productivity

Tools

Lifestyle

Sports

Libraries

Multimedia

Tools

Sports

Entertainment

Shopping

Puzzle

Arcade

Communication

Productivity

Themes

Figure 3: SOM for Android permissions with categorylabels assigned in a winner-take-all approach

Figure 2 shows that, in general, applications aresparsely populated (i.e., more light regions than dark) inthe permission space with two exceptions: the dark shadedregions in the lower right and left corners that corre-spond to applications from Comics and Themes respec-tively (shaded in Figure 3). Themes is a general categorywhich contains user interface enhancements to both theAndroid OS and its applications. Sparsely populated re-gions (i.e., light regions in Figure 2) involve the applica-tions which vary substantially in terms of requested per-missions.

In Figure 3, we see that the Communication categorypopulates the upper middle section of the SOM, althoughit is mixed with applications from categories such as News,Tools, Social (shaded upper region, in Figure 3). We notethat in terms of requested permissions, Communication isthe most diverse category, as shown in Table 1. Therefore,applications in this category implement a diverse set offunctionality, which consequently causes the Communica-tion category to represented by numerous neurons of theSOM.

Further inspection revealed that the most prevalenttypes of applications in the upper middle region were ap-plications that make use of network communications andstandard phone features (i.e., making and receiving calls,sending and receiving text messages). Given that SOMplaces similar input patterns in the same region, this in-dicates that applications from the same category do notnecessarily ‘behave’ similarly (i.e., do not request similarpermissions). Hence, applications requesting similar setsof permissions (from multiple categories) are clustered to-gether which implies that applications from different cat-egories can request similar sets of permissions. This re-flects the fact that the categories defined by Google arebased on semantic activity classes rather than the techni-cal features used to implement them.

In order to identify the frequency of use and the corre-lations between the permissions, we expand our analysisby performing component plane analysis [16] of the SOM.Component analysis reduces the dimensionality of the in-put space allowing us to visualize the use of permissions ina 2-dimensional space. Compared to Figures 2 and 3, com-ponent plane analysis in Section 5.1 summarizes the use ofAndroid permissions in separate visualizations exclusive tothe permission in question.

5.1 Component Plane AnalysisUnderstanding the relationships between variables in

complex data is a challenging task. The 1,100 Android ap-plications in our dataset requested access to 119 distinctpermissions where each permission represents an addi-tional dimension within which applications are expressed.Component plane visualizations provided in this sectionrepresent the component distribution in individual dimen-sions. In our work, given that each dimension represents apermission, the component plane analysis and visualiza-tions can be considered as a sliced visualization of theSOM. The component plane analysis reported in this pa-per is specific to the 1,100 applications in the training set.However, we believe that employing the most popular 50applications from each category provides a representativedataset upon which an empirical analysis of the Androidpermission model can be performed.

Figure 4 shows the component plane analysis of all thepermissions encountered in our dataset. Each componentplane in Figure 4 has the relative distribution of a per-mission. In this representation, light shades represent thefrequent use of the permission whereas the dark shadesrepresent the infrequent use. In this section, we highlightsome of the interesting results from the component planevisualizations.

internet access_coarse_location read_phone_state set_wallpaper access_network_state call_phone wake_lock write_external_storage

vibrate write_contacts read_logs read_contacts write_settings bluetooth bluetooth_admin get_tasks

read_sms write_sms access_wifi_state change_wifi_state access_fine_location receive_boot_completed read_history_bookmarks receive_sms

send_sms process_outgoing_calls call_privileged modify_phone_state set_preferred_applications install_shortcut uninstall_shortcut restart_packages

modify_audio_settings google_auth receive_mms change_network_state disable_keyguard broadcast_sticky access_course_location add_system_service

read_owner_data master_clear persistent_activity write_accounts write_history_bookmarks expand_status_bar read_settings set_wallpaper_hints

global_search_control camera record_audio read_sync_settings read_sync_stats read_attachment flashlight read_calendar

write_calendar access_gps access_location set_orientation access_mock_location swn_wallpaper_changer google_auth.finance google_auth.other_services

access_location_extra_commands get_accounts google_auth.local google_auth.wise google_auth.writely write_owner_data write_phone_state hardware_test

status_bar control_location_updates phone system_alert_window change_configuration instantlyrics.privateservices read_external_storage device_power

write_apn_settings write_sync_settings change_wifi_multicast_state read_gmail google_auth.mail write_settings call_phone google_auth.ah

notepad.read_permission notepad.write_permission access_intents mount_unmount_filesystems shopping.read_permission shopping.write_permission manifest.permission.access_fine_location access_read_phone_state

delete_packages install_packages write_secure_settings receive_wap_push nabcontentprovider.read_permissionnabcontentprovider.write_permission read_only access_assisted_gps

access_cell_id boot_completed read_settings mode_world_writeable access_surface_flinger brick internal_system_window read_frame_buffer

read write overclock_activity control_permission action_boot_completed fullscreen bind_input_method

Figure 4: Component plane analysis of the permissions. The lighter areas indicate the regions in which thepermission was requested (see Section 5.1).

Figure 5 shows the component plane for thea.p.INTERNET permission. Light shades indicate theregions in which the permission is requested. The compo-nent analysis indicates that the a.p.INTERNET permissioncovers a large portion of the map which means it isrequested by the majority of applications in our dataset(over 60%). The lower right region in Figure 5 rep-resents the few applications which do not request thea.p.INTERNET permission, mainly the applications in theTheme and Productivity categories (based on labels inFigure 3). Smartphones strive for always-on Internetconnectivity, and can attain this by providing differentconnection modes; 3G, Wi-Fi and Bluetooth are waysin which the Android OS (and most other smartphones)can connect to online services. Our dataset also con-tained only free applications which in many cases aread-supported. Advertisements are pulled from the Inter-net causing developers to request this permission, evenif their application requires no connectivity for its corefunctionality.

Furthermore, smartphones, and Android in particularhave a strong focus on ‘cloud’ services, which are hostedon a remote server. Many of the Android applications aretightly integrated with Google servers. This also makessense for devices with low computing power.

internet

Figure 5: Component plane visualization of thea.p.INTERNET permission

The analysis of component planes can reveal correla-tions between permissions. Specifically, two permissionsare likely to be correlated, if the component plane visual-izations are similar. Figure 6 shows the component planesfor the location-related permissions, providing location in-formation at varying degrees of accuracy. The compo-nent plane visualizations of a.p.ACCESS_COARSE_LOCATIONand a.p.ACCESS_FINE_LOCATION in Figure 6 show thatboth permissions are requested on the upper right re-gion of the map (hexagons in the same region are lightshaded). This region corresponds to the Travel, Shop-ping, Communication and Lifestyle categories, in Figure 3.Applications requiring access to fine location are slightlymore biased towards the upper right corner, further nar-rowing down that area to applications that need pre-cise location information, such as navigation applications.Furthermore, given that the component visualizations ofa.p.ACCESS_ASSISTED_GPS and a.p.ACCESS_CELL_ID arevery similar in Figure 6, they are highly correlated. This isexpected since assisted GPS relies on the cell tower ID forlocation.

Our analysis determined that pairs of per-missions are common (Figure 4), such as re-

questing both a.p.ACCESS_COARSE_LOCATION anda.p.ACCESS_FINE_LOCATION. We believe that the loca-tion permission was divided in this way by Google withthe hope that developers would use coarse location forservices like news or weather applications (which do notneed to know your exact GPS coordinates, but simply ageneral geographic area). Fine location would be usedby navigation applications. However, we see developersrequesting both coarse and fine location frequently (some-times even mock location, which creates a ‘simulated’location for testing purposes).

access_coarse_location access_fine_location

access_gps access_location

access_assisted_gps access_cell_id

Figure 6: Component plane visualization of the loca-tion related permissions

In addition to location-aware applications, various ap-plications such as tools and messaging apps commonlyuse pairs of permissions. For example, Figure 4 demon-strates that the a.p.WRITE_SMS and a.p.READ_SMS permis-sions occur in the same region. This implies that appli-cations requesting SMS permissions also request similarsets of permissions, which resulted in both permissions tobe active in the same region. Moreover, various other per-mission pairs are observed such as writing to and readingfrom the external storage and installing and uninstallingshortcuts. Enck et al. [13] note that applications that haveboth the send and write or the receive and write SMS per-mission labels are ‘dangerous’ due to being able to inter-cept or transmit messages without the user’s knowledge.It is important to note that although these permissions arecorrelated in our component plane analysis, this does notmean that applications are requesting both permissions.

Rather, this means that applications have similar charac-teristics if they are requesting send or receive SMS func-tionality. This could lead to false positives if classifyingmalware based only on component plane analysis.

The component analysis of the permissions related tosystem settings are shown in Figure 7. Applications inTools and Communication categories (shaded upper regionin Figure 3) require access to system settings hence thelight shaded hexagons in the corresponding upper region.

write_contacts read_contacts

process_outgoing_calls disable_keyguard

add_system_service device_power

Figure 7: Component plane visualization of the per-missions related to system settings

Figure 8 shows the commonly requested permissionswhich are used by applications in different regions of theSOM (light shaded throughout the SOM). These permis-sions correspond to common tasks such as installing short-cuts, using the camera and recording audio. For exam-ple, the a.p.CAMERA permission is in the Shopping cate-gory (applications that allow the user to take a picture of abarcode for price comparisons) as well as Communication(using the camera for sending pictures or videos), Travel(taking pictures of landmarks or tourist attractions), etc.Commonly requested permissions tend to be scattered allover the SOM.

In addition to a small set of permissions requested by asubstantial number of applications, many permissions arerequested by only a few applications. We believe that theseinfrequent permissions are mainly defined by developersto facilitate interaction with other Android applications.On the other hand, frequently requested permissions, par-ticularly a.p.INTERNET do not provide the sufficient ex-

install_shortcut uninstall_shortcut

camera record_audio

Figure 8: Component plane visualization of the com-monly requested permissions

pressiveness to enforce control over the degree of Internetaccess that the application obtains. Thus, we believe thata.p.INTERNET permission fails to provide sufficiently finegrained control of the resources. By contrast, there existnumerous developer-defined permissions which effectivelyact as an ‘access control list exception’, allowing other ap-plications to access functionality through a newly definedAPI. In other words, they specify which other applicationscan communicate with the application in question ratherthan a permission, which specifies the resources that theapplication in question can access. We suggest that thecurrent Android permission model can be improved by fur-ther distinguishing between different classes of Internetaccess, while providing a mechanism for developers tospecify an access control list without the use of self-definedpermissions.

6. FURTHER DISCUSSIONDesigning a permission-based system is a challenging

task because system designers must anticipate what us-age will be given to the permissions defined in their sys-tem. The analysis in this paper has helped to identify de-veloper usage patterns in a real-world dataset of top An-droid applications. Additionally, there is a constant strug-gle to make the system highly configurable under differentuse-cases while maintaining a low level of complexity. Un-derstanding how the permission model is used in practicecan help in making modifications to improve currently de-ployed permission systems.

Our analysis shows that in Android a small number per-missions are very frequently used and a large number ofpermissions are only occasionally used. Furthermore, ouranalysis shows correlations between several of the infre-quently used permissions.

We note that having finer-grained permissions in apermission-based system enables users to have detailedcontrol over what actions are allowed to take place.

Whether it is beneficial to provide finer granularity willdepend on many factors within a particular environment,as it increases complexity and thus may have, for example,usability impacts on designers and end-users.

In the case of Android, having ‘too many’ permissionsimpacts both developers and end users. Developers mustunderstand which permissions are needed to perform cer-tain actions; determining this is often non-trivial, even for‘experts’. While some enthusiastic developers might takethe time to learn what each of the 110 or more permissionsdo and request them appropriately when needed, other de-velopers might choose to simply over-request functionalityto make sure their application works. Smartphone usersthemselves are (currently, through no choice of their own)heavily involved in the Android permission architecturesince it is they who either allow or deny the installationof applications based on the presented set of requestedpermissions. Whether or not this is the best approach tohandling such a complex permission system is beyond ourpresent scope, but does lead us to question if it is reason-able to ask non-technical users to configure these complexsystems.

6.1 Possible Enhancements to AndroidThe Android permission model does not currently make

use of the implied hierarchy in its namespace. For exam-ple, a.p.SEND_SMS and a.p.WRITE_SMS are two indepen-dent permission labels, instead of being grouped, for in-stance, under a.p.SMS.*. Android includes an optionallogical permission grouping [9] that is used for displayingpermissions with more understandable names (e.g., oneof the groupings reads “Services that cost you money” in-stead of a.p.CALL_PHONE). This grouping, however, doesnot allow developers to hierarchically define permissions,which could potentially extend current Android-definedpermissions to express more detailed functionality.

In the case of Android particularly, a permission hi-erarchy would allow for an extensible naming conven-tion and help developers more accurately select (onlythe) needed features. One example would be a freeapplication that displays ads from domains belongingto Admob. Currently a developer would include thead code snippet, and request the a.p.INTERNET permis-sion. This permission allows the application to com-municate over any network and retrieve any data fromany server in the world. A more fine grained hierarchi-cal permission scheme could enable the developer to re-quest the a.p.INTERNET.ADVERTISING(*.admob.com) per-mission which could limit network connectivity to onlydownload ads in static HTML from subdomains of Admob.A hierarchical permission scheme could help users under-stand why an application is requesting specific permis-sions, but more importantly, could help developers betteruse the principle of least privilege. This modification isnot backwards compatible with the currently deployed An-droid OS, therefore it might be better suited for an entirelynew model instead.

Developers that make use of self-declared permissions torestrict access to APIs in their third party applications addcomplexity to the system with every new permission de-

fined. In the current flat permission model, new developer-introduced permissions will likely be used infrequently andindependently of other permissions. Logically grouping allself-defined permissions under one category could help in-form users that these permissions are not part of the core(Google-defined) set.

Another possible enhancement to Android is to identifypermissions that when used independently, do not pose aserious threat to the device or the user. These permissionscould be displayed in one informative-only collection priorto prompting the user to (as currently done) individuallyaccept other, perhaps more high-risk permissions. Thismay significantly reduce the number of individual accep-tance decisions presented to the user, and hopefully in-crease user attention across the remaining items.

6.2 Applicability to Other Permission-BasedSystems

The methodology presented in this work has allowed usto understand how developers use the permission-basedsecurity model in Android. We believe that our method-ology is applicable to explore usage trends in other per-mission based-based systems. A base requirement for themethodology to work is being able to display applicationsand associated permissions as a bit string as described inSection 3.3. For this representation to be possible, the setof permissions requested by an application must be acces-sible. In the case of Android, the set is statically readablein a manifest, but other systems might have different im-plementations.

Google’s Chrome OS extension system [4, 10] uses anAndroid-like manifest and permissions to access advancedfunctionality, which makes this system a prime candidatefor applying our methodology. An empirical study of alarge set of third-party extensions using our SOM-basedmethodology, could help identify what correlations, if any,are present in requesting permissions to open tabs, readbookmarks, etc. This may also be of use in addressingother security concerns raised in recent work [10].

7. CONCLUSIONWe have introduced a methodology to the security com-

munity for the empirical analysis of permission-based se-curity models. In particular, we analyzed the Androidpermission model to investigate how it is used in prac-tice and to determine its strengths and weaknesses. TheSelf-Organizing Map (SOM) algorithm is employed, whichallows for a 2-dimensional visualization of highly dimen-sional data. SOM also supports component planes analysiswhich can reveal interesting usage patterns.

We have analyzed the use of Android permissions in areal-world dataset of 1,100 applications, focusing on thetop 50 application from 22 categories in the Android mar-ket.

The results show that a small subset of the permis-sions are used very frequently where a large subset ofpermissions were used by very few applications. Wesuggest that the frequently used permissions, specifi-cally a.p.INTERNET, do not provide sufficient expressive-ness and hence may benefit from being divided into sub-

categories, perhaps in a hierarchical manner as explainedin Section 6.1. Conversely, infrequent permissions such asthe self-defined and the complementary permissions (e.g.,install/uninstall) could be collapsed into a general cate-gory. Providing finer granularity for frequent permissionsand combining the infrequent permissions can enhancethe expressiveness of the permission model without in-creasing the complexity (i.e., maintaining a constant over-all permission count) as a result of the additional permis-sions. We hope that our SOM-based methodology, includ-ing visualization, is of use to others exploring independentpermission-based models.

AcknowledgementsWe thank William Enck and members of Penn. State’s SIISlab for making available the dataset used for our analysis.We also thank our CCS shepherd, Patrick McDaniel, andanonymous referees for their helpful comments. The thirdauthor acknowledges NSERC funding under a DiscoveryGrant and as Canada Research Chair in Authentication andComputer Security. Partial funding from NSERC ISSNet isalso acknowledged.

8. REFERENCES[1] Android. http://www.android.com Retrieved

February 6th, 2010.

[2] Android Market Statistics from Androlib.http://www.androlib.com/appstats.aspxRetrieved July 7th, 2010.

[3] BlackBerry APIs with controlled access.http://docs.blackberry.com/en/developers/deliverables/5580/Java_APIs_with_controlled_

access_447163_11.jsp Retrieved April 9th, 2010.

[4] Formats: Manifest Files - Google Chrome Extensions- Google Code. http://code.google.com/chrome/extensions/manifest.html#permissions RetrievedApril 9th, 2010.

[5] How Android Security Stacks Up.http://www.technologyreview.com/communications/24944/page1/ April 1st, 2010.

[6] Independent Security Evaluators - ExploitingAndroid. http://securityevaluators.com/content/case-studies/android/ Retrieved January15th, 2010.

[7] The Android Developer’s Guide. http://developer.android.com/guide/index.htmlRetrieved January 29th, 2010.

[8] The Android Developer’s Guide - Android ManifestPermissions. http://developer.android.com/reference/android/Manifest.permission.htmlRetrieved April 5th, 2010.

[9] The Android Developer’s Guide - Permission Groups.http://developer.android.com/guide/topics/manifest/permission-group-element.htmlRetrieved April 7th, 2010.

[10] A. Barth, A. P. Felt, P. Saxena, and A. Boodman.Protecting Browsers from Extension Vulnerabilities.In Proceedings of the 17th Network and DistributedSystem Security Symposium (NDSS 2010).

[11] K. Beznosov, P. Inglesant, J. Lobo, R. Reeder, andM. E. Zurko. Usability meets access control:challenges and research opportunities. In SACMAT’09: Proceedings of the 14th ACM symposium onAccess control models and technologies, pages73–74, New York, NY, USA, 2009. ACM.

[12] D. Curry. UNIX System Security. Addison-Wesley,1992.

[13] W. Enck, M. Ongtang, and P. D. McDaniel. OnLightweight Mobile Phone Application Certification.In E. Al-Shaer, S. Jha, and A. D. Keromytis, editors,ACM Conference on Computer and CommunicationsSecurity, pages 235–245. ACM, 2009.

[14] W. Enck, M. Ongtang, and P. D. McDaniel.Understanding Android Security. IEEE Security &Privacy, 7(1):50–57, 2009.

[15] J. Han. Data Mining: Concepts and Techniques.Morgan Kaufmann Publishers Inc., San Francisco,CA, USA, 2005.

[16] T. Kohonen. Self Organizing Maps. Springer, thirdedition, 2001.

[17] B. W. Lampson. Protection. SIGOPS Oper. Syst. Rev.,8(1):18–24, 1974.

[18] M. Ongtang, S. E. McLaughlin, W. Enck, and P. D.McDaniel. Semantically rich application-centricsecurity in android. In ACSAC, pages 340–349. IEEEComputer Society, 2009.

[19] R. W. Reeder, L. Bauer, L. F. Cranor, M. K. Reiter,K. Bacon, K. How, and H. Strong. Expandable gridsfor visualizing and authoring computer securitypolicies. In CHI ’08, pages 1473–1482, New York,NY, USA, 2008. ACM.

[20] D. K. Smetters and N. Good. How users use accesscontrol. In SOUPS ’09: Proceedings of the 5thSymposium on Usable Privacy and Security, pages1–12, New York, NY, USA, 2009. ACM.

[21] A. Ultsch and H. Siemon. Kohonen’s self-organizingfeature maps for exploratory data analysis. InProceedings of the International Neural NetworkConference (INNC’90), Dordrecht, Netherlands,pages 305–308. Kluwer, 1990.

[22] J. Vesanto. Data Mining Techniques Based on theSelf-Organizing Map. Master’s Thesis, HelsinkiUniversity of Technology, May 1997.

a methodology for empirical analysis of permission-based...

Documents